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Preface 


This volume contains the proceedings of the two-day departmental seminar organised by 
the Computer Applications Department of Vidya Academy of Science & Technology during 
4-5 April 2018. The seminar was the culmination of a coursework (with course code MCA 
2010 506(P) Seminar) to be completed by the MCA students of Calicut University during 
the Fifth Semester of the MCA programme. As part of the course, each student has to 
prepare and present a paper on any topic in the field of computer science. 

As part of the decennial celebrations of the Department (the Computer Applications 
Department of the College was established in the academic year 2006 — 07), it was decided 
to introduce some steps to enhance the learning experience of the students. And as part of 
this, it was decided to offer the seminar course in a new format. In the new format, it was 
the teachers who identified the areas in which the students are to work and the teachers 
provided the students with some initial learning materials in the form of papers. After the 
initial reading of these materials, the students had to search for additional reading materials 
themselves. The students were required to study the papers and present a “study paper” in 
a Departmental Seminar. The papers collected in this volume are the study papers prepared 
by the students and presented in the two-day seminar. They are indicative of the level of 
achievements of the students. 

As part of the learning process, the students were also required to present the paper 
in the IEEE conference paper format. To facilitate this, the students were given a basic 
introduction to the ETRX software and the IEEEtran document style. 

The emphasis in the whole exercise was to give the students a hands on experience in 
preparing a conference/seminar paper and not on making the students learn a new topic or 
subject in depth. The expected learning outcomes include: 


e understanding of the structure of a research paper, 
e awareness about the process of literature survey, 


e basic knowledge about the accurate preparation of bibliography and their citations in 
the paper, 


e exposure to the IEEE format for the preparation of conference/journal papers, 
e introduction to the concepts of “Abstracts”, “Keywords”, and the like, 

e experience in applying these concepts by actually preparing a paper, and 

e methodology of presenting a multi-author paper in a seminar/conference. 


The articles compiled in this Proceedings are not even moderately edited. The editors 
have only ensured that the basic learning outcomes outlined above have been met. However, 
the editors have tried to ensure that the titles of chapters, sections, etc., the abstract, 


figure and table captions, and the like are as per IEEE guidelines. The references have not 
been checked for accuracy and completion. The papers have not been edited for grammar, 
punctuation, spelling or style.! 

The present work is only a record of the activities of the course referred to above and 
it is prepared only for private circulation. To the best of our understanding the authors of 
the papers have given proper attribution to ideas and material presented in the papers. If 
there are no attributions or improper attributions, it was unintentional. Hence the contents 
have not been subjected to plagiarism tests. 

It is believed that the teachers as well the students enjoyed very much the new format 
of the seminar course. There are still much scope for improvement. It is our hope that the 
future batches of students will have a stronger and wider learning experience from a similar 
seminar courses. 


April 2018 Editors 


lFor different models of editing, see, for example “IEEE Editorial Style manual”, [Online] Available: 
https: //www.ieee.org/documents/style_manual.pdf (April 2017). 
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A Study on Fields Impacted by Cheminformatics 


Amrutha K R, Aswathy C U, Neethu K J 
Department of Computer Applications 
Vidya Academy of Science & Technology 
Thrissur-680501 


Abstract—This paper presents a study on the fields impacted 
by cheminformatics. Cheminformatics is the use of computer 
and informational techniques applied to a range of problems 
in the field of chemistry. The major task of cheminformatics is 
finding and optimizing the drug discovery process. Cheminfor- 
matics tools helps medical chemist for better understanding of 
complex structures of chemical compounds. Virtual screening 
is a technique in cheminformatics used for drug discovery. 
Cheminformatics plays a role of study the chemical components 
in the biological systems. Cheminformatics mainly focuses on 
chemical information. Chemical data contains data tables of 
chemical properties. Cheminformatics is done by with the help 
of some software they are, Chemwindow, Alchemy, Chemdraw, 
CLIFF etc. 

Index Terms—Security Issues, Challenges and Solutions for 
E-Commerce Applications over Web 


I. INTRODUCTION 


In cheminformatics the chemistry and information technol- 
ogy are coupled together to handle problems in the field of 
chemistry. It is also known as chemioinformatics, chemoin- 
formatic and chemical informatics. Design, creation, mining, 
organization, management, analysis, visualization, drug dis- 
covery, storage and retrieval of chemical data, virtual libraries 
are some tasks of cheminformatics. The term chemo informat- 
ics was defined by F.K. Brown in 1998[1]. 

The major task of cheminformatics is finding and optimizing 
the drug discovery process, actual drug design is a time- 
consuming task, and it is very expensive. Cheminformatics 
deals with discovering drugs based in modern drug discov- 
ering techniques, it identifies complex issues in traditional 
drug discovery system. Cheminformatics tools helps medical 
chemist for better understanding of complex structures of 
chemical compounds. Available chemical data is refined and 
it is converted to the knowledge for the important decision in 
drug discovery, this method helps predicting and developing 
new drugs. Developing a new drug is not a simple task. Virtual 
screening, machine learning, deep learning and clustering 
algorithms are used for supporting drug discovery process. 

Clustering algorithms play an important role in cheminfor- 
matics and mainly in the drug discovery process, Clustering is 
a process which partitions a given data set into homogeneous 
groups based on given features such that similar objects are 
kept in a group whereas dissimilar objects are in different 
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groups. Clustering methods are two types, hierarchical and 
non-hierarchical. Non-hierarchical algorithms are faster than 
hierarchical algorithms. Jarvis Patrick algorithms is a non- 
hierarchical clustering algorithm used in drug discovery, it 
has many applications in cheminformatics. Cheminformatics 
problem of identifying a suitable drug for a receptor/target can 
be solved using Quantitative Structure Activity Relationship 
(QSAR) [2] approach. QSAR are mathematical relationships 
linking chemical structure and pharmacological activity in a 
quantitative manner for a series of compounds. 

Reaction databases has a collection of information, and it 
contain information about products, educts and reaction mech- 
anisms. Complex task of chemical reactivity uses the reaction 
databases. It is one of the main task of cheminformatics. The 
reaction prediction and knowledge mining take a lot of time. 
The big challenge in cheminformatics is the understanding of 
chemical reactions. 

Cheminformatics mainly focuses on chemical information. 
Chemical data contains data tables of chemical properties. 
It is difficult to visualize many record and many columns 
and integrate the data. Information visualization integrate to 
bioinformatics and cheminformatics. Data visualization takes 
information from large amounts of raw data, it is a technique 
for creating images, diagrams, or any animations, to com- 
municate a message. Most peoples are not able to visualize 
data in more than three dimensions. So, data visualization 
helps them by visualizing the data. Visualization is useful for 
extract the useful data from real life. Multivariate dataset in 
cheminformatic is difficult due to the large amount of data and 
inherent noise. Virtual machines are used in Cheminformatics 
for develop and testing of software for multiple systems. 

Virtual screening is a technique in cheminformatics used for 
drug discovery by searching large libraries of molecule struc- 
tures [3]. The computational screening of molecular databases 
for binding partners is called Virtual screening. It has been 
popular in drug discovery process. It can reduce costs in many 
aspects. The generation phase of drug development process has 
a component named virtual screening that helps in ranking the 
drug molecules based on some predicted property [4]. Virtual 
screening classified into Structure-based and Ligand-based. 
Successful and popular methods are Support Vector Machine 
(SVM) [5] and Binary Kernel Discriminant Analysis (BKD) 
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[6]. Extreme Learning Machine (ELM) is also one of them that 
has been recently introduced [7]. The virtual screening task 
proposed the Weighted Similarity Extreme Learning Machine 
algorithm (WELM). This algorithm is powerful, iteratively 
free, and easy to program. 

Cheminformatics plays a role of study the chemical compo- 
nents in the biological systems. Chemical compounds have a 
biological activity on animal genes. This activity can discover 
using machine learning methodology. Machine learning is a 
technique, it uses cognitive science, computer science, pattern 
recognitions and statistics for classification of chemical data. 
Deep learning is a part of machine learning algorithms. 
Deep learning is a function that imitates the workings of 
the human brain in processing data and creating patterns for 
use in decision making, it is a subset of machine learning 
in Artificial Intelligence (AI) that has networks which are 
capable of learning unsupervised from data that is unstructured 
or unlabelled. 

In the latest few years, deep learning technology practically 
improved by applying GPU and scientific BIGDATA and is 
becoming a significant impact on various areas of modern 
society; speech and audio recognition, visual recognition, 
object detection, and also drug-discovery and genomics [8]. 

The chemical informatics rich with many tools like 
LogCHEM, MMSINC, Metrabase, ChemDes, Open Drug Dis- 
covery Toolkit(ODDT), PubChem and Bigdata are support 
to storage and prediction of chemical data and structures. 
LogCHEM is an inductive logic programming (ILP) based 
tool. It is used for the discriminative interactive mining of 
the chemical fragments. LogCHEM is capable of storing 
information on atoms and on their locations. LogCHEM 
can take advantage of many number of search algorithms. 
MMSINC is a free web-oriented database used for virtual 
screening and cheminformatics application contains a large 
number of chemical compounds in 3D formats. PubChem 
and PDB databases are integrated with MMsINC. The Me- 
trabase is help to develop freely accessible database for small 
molecule transfer data analysis and (Q)SAR modeling. The 
orChem is deal with the registration, searching, indexing to 
support fast substructure and similarity searching. ChemDes 
is an integrated web-based platform for molecular descriptor 
and fingerprint computation. SMILES is a general-purpose 
chemical nomenclature and data exchange format. Data set 
curation in cheminformatics is ignored. The KU Chemical 
Biology Database(KUChemBio) established a collection of 69 
datasets for experiments to avoid this situation. Cheminfor- 
matics is done by with the help of some software they are, 
Chemwindow, Alchemy, Chemdraw and CLIFF etc. 


II. DRUG DISCOVERY 


Drug discovery is the process by which new candidate 
medications are discovered. Historically, drugs were discov- 
ered through identifying the active ingredient from traditional 
remedies. Modern drug discovery involves the identification of 
screening hits, medicinal chemistry and optimization of those 
hits to increase the affinity, selectivity metabolic stability. 


Rough set based Rule reduction 


Drugs with 
similar 


property 


Fig. 1. 


drug discovery 


A. Rough set-based rule reduction 


Here computational framework for HIV drug discovery is 
produced by combining cheminformatic and rough set. In this 
technique the rough based rule induction is used to compare 
the rule sets from different categories of HIV based and 
general drug database. This comparison leads to discovery 
of drugs which have similar properties of the HIV drugs, 
these detected drugs will have forwarded for further chemical 
testing. 

Due to increasing speed of spreading of the HIV viruses 
more drugs with similar property has to be developed out of 
general drugs, which can be used as a cure for HIV our method 
is based on, if two drugs have similar physical properties then 
they may have similar biological activity[9]. 


B. Clustering analysis 


The goal of cluster analysis is the classify a set of objects 
in such a way that objects in the same groups are more similar 
to each other than to those in other groups, it is used in many 
fields data mining, and statistical data analysis, machine learn- 
ing, pattern recognition, image analysis, information retrieval, 
bioinformatics, data compression, and computer graphics. 

The Fuzzy Kohonen SOM Implementation and Clustering of 
Bio-active Compound Structures are used for drug discovery. 
Fuzzy clustering or soft clustering is a form of clustering in 
which each data point can belong to more than one cluster, 
these methods are computationally very intensive, but not 
suitable for large data base. Fuzzy SOM (Self organizing 
Map) is suitable for clustering of small bioactive compound 
database set and optimizing parameters like learning rate and 
another neighborhood size. the fuzzy SOM method may be 
evaluated on large chemical dataset having tens of thousands of 
compounds where some other descriptors like physiochemical 
properties and 3D descriptors can also be used [10]. 

In chemical information system, most of the clustering 
algorithms are two types, hierarchical like single and complete 
linkage algorithms, Wards and Group Average algorithms, and 
nonhierarchical like k-means and Jarvis Patrick. The result 
of study of performance of algorithm is depend up on the 
parameters which is used to study, Among the hierarchical 
methods, the best result was produced by Ward’s hierarchical 
agglomerative method and Jarvis-Patrick produced the best 
results compared to the other non-hierarchical methods tested 
[11]. However, in another study [12] the performance of Jarvis 
Patricks method was proved to be very poor. The performance 
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of the Wards method was the best. The good performance of 
Wards was confirmed in another study [13] . 


C. Machine Learning 


Machine learning is a field of computer science that gives 
computer systems the ability to “learn” with data, it is 
employed in a range of computing tasks where designing 
and programming explicit algorithms with good performance 
is difficult. Machine learning focuses on prediction-making 
through the use of computers. It is sometimes combined with 
data mining, and it can also be unsupervised and be used to 
study and establish baseline behavioural profiles for several 
entities and then used to detect meaningful anomalies. 

Machine learning techniques are becoming famous in drug 
discovery process, it can be used to predict the biological 
activities of chemical compounds. We proposed the Weighted 
similarity extreme Learning Algorithm(WELM)[14]. The al- 
gorithm is powerful, free and easy to program. While using 
WELM improved the performance of virtual screening. Ma- 
chine learning algorithms are introduced to handle the complex 
chemical data and several algorithms have been used, although 
the need to have better predicting methods exist[15]. 

The medicinal chemistry data put up on in the process of 
new drugs has some characteristics that difficult the application 
of machine learning techniques in the analysis[16]. 


D. Virtual screening 


Virtual screening (VS) is a computational technique used 
in drug discovery to search libraries of small molecules in 
order to identify those structures which are most likely to bind 
to a drug target. Virtual screening has been defined as the 
*automatically evaluating very large libraries of compounds” 
using computer programs. As the accuracy of the method has 
increased, virtual screening has become an integral part of the 
drug discovery process. 

There are two categories of screening techniques: ligand- 
based and structure-based. 

In Ligand-based virtual screening a group of structurally 
diverse ligands that binds to a receptor, a model of the receptor 
can be built by exploiting the collective information contained 
in such set of ligands. These are known as pharmacophore 
models. A candidate ligand can then be compared to the 
pharmacophore model to determine whether it is compatible 
with it and therefore likely to bind. Another approach to 
ligand-based virtual screening is to use 2D chemical similarity 
analysis methods to scan a database of molecules against one 
or more active ligand structure. 

Structure-based virtual screening involves docking of can- 
didate ligands into a protein target followed by applying a 
scoring function to estimate the likelihood that the ligand will 
bind to the protein with high affinity. 

Two type of methods for Virtual screening methods in drug 
discovery are: 

1) iterative map reducing: In cheminformatics, iterative 
map reducing used for drug discovery, to searching to large 
libraries of molecule compounds. Virtual screening frequently 


uses SVM (supervised machine learning technique), for regres- 
sion and classification analysis of the molecule component. 
An iterative map reduce programming model is presenting a 
MAP Reduce implementation of SVM based virtual screening, 
this implementation has a scope for using large public cloud 
infrastructures efficiently for virtual screening. 

In science the number of applications is increasing day to 
day are based on the iterative algorithms. The map reduce 
is refer to acyclic data flow modell, and it dont have built- 
in facilities for iterative programs, to overcome this problem 
the output of the previous map reduce is input to next map 
reduce to archive iterative behavior map reduce iteration wand 
additional cost in network bandwidth, input/output and CPU 
for read and reprocessed. The iterative map reduce can apply 
in data mining, graph processing and model fliting etc[17]. 

2) Case-Based Meta learning algorithm : CBML is identi- 
fies the best predictor for specific new case, based on similar 
case among its benchmark, this method used to predict the 
function that have best performance rather than average the 
performance of all predictors.CBML support cheminformatic 
and bioinformatics by providing robust and reliable predictive 
modelling. CBML is applicable to address problem in the 
area include protein structure prediction, protein interaction, 
disease-causing mutation and functional roles of non-coding 
DNA[18]. 


IHI. DEEP LEARNING 


There is a lack of system security, reliability, standards, and 
some communication protocols. It is difficult to integrate the 
Internet and EC software with some existing applications and 
databases. Market culture is averse to electronic commerce 
(customers cannot touch or try the products); The users loss 
of privacy, the loss of regions and countries cultural and 
economic identity. 


IV. DATA STORAGE 


A chemical database is a database specifically designed to 
store chemical information, this information is about chemical 
and crystal structures, spectra, reactions and syntheses, and 
thermophysical data. 

Chemical structures are traditionally represented using lines 
indicating chemical bonds between atoms and drawn on paper 
. While these are ideal visual representations for the chemist, 
they are unsuitable for computational use and especially for 
search and storage. Small molecules (also called ligands in 
drug design applications), are usually represented using lists of 
atoms and their connections. Large molecules such as proteins 
are however more compactly represented using the sequences 
of their amino acid building blocks. Large chemical databases 
for structures are expected to handle the storage and searching 
of information on millions of molecules taking terabytes of 
physical memory. 

Reaction databases has a collection of information, and it 
contain information about products, educts and reaction mech- 
anisms. Complex task of chemical reactivity uses the reaction 
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databases. It is one of the main task of cheminformatics. The 
reaction prediction and knowledge mining take a lot of time. 

The first task of chemoinformatics is to represent the 
chemical compounds in computer for effective processing of 
chemical reactions [20].Most chemical databases store infor- 
mation on stable molecules but in databases for reactions also 
intermediates and temporarily created unstable molecules are 
stored. 


V. DATA VISUALISATION 


Data visualization is a technique for creating images, di- 
agrams, or animations to communicate a message. Visual- 
ization through visual imagery has been an effective way to 
communicate ideas. Visualization has expanding applications 
in science, education, engineering, interactive multimedia, 
medicine, etc. Typical of a visualization application is the 
field of computer graphics. Data visualization is a related 
subcategory of visualization dealing with statistical graphics 
and geographic or spatial data that is abstracted in schematic 
form. Data visualization is an important topic in data mining. 
The visualization method supports data exploration. 

Cheminformatics mainly focuses on chemical information 
such as chemical structures; spectral, high throughput screen- 
ing; structure-activity relationships; and medicinal property 
data[19]. Chemical data contains data tables of chemical 
properties. It is difficult to visualize many record and many 
columns and integrate the data. Information visualization in- 
tegrate to bioinformatics and cheminformatics. 

Data visualization takes information from large amounts 
of raw data, it is a technique for creating images, diagrams, 
or any animations, to communicate a message. Most peoples 
are not able to visualize data in more than three dimensions. 
So, data visualization helps them by visualizing the data. 
Visualization is useful for extract the useful data from real life. 
Multivariate dataset in cheminformatic is difficult due to the 
large amount of data and inherent noise. Virtual machines are 
used in Cheminformatics for develop and testing of software 
for multiple systems. Data visualization used in drug discovery 
communities. It is useful to understand natural groupings in a 
large multivariate dataset using data visualization [20]. 


VI. TOOLS 


Cheminformatics toolkits are assembly of tools, a single 
utility program, a set of software routines or a complete 
integrated set of software utilities that are used to develop 
and maintain applications and databases, and used in virtual 
screening, chemical database mining, and structure-activity 
studies. Toolkits are often used for experimentation with new 
methodologies. Their most important functions deal with the 
manipulation of chemical structures and comparisons between 
structures.Cory and Wipke in 1969 first showed how the 
computers were used for chemical synthesis by using the 
program OCCS [21]. 

Toolkits provide the following functionality: 


e Read and save structures in various chemistry file for- 
mats. 


e Determine if one structure is a substructure of another. 

e Determine if two structures are equal (matching). 

e Identification of substructures common to structures in a 
set. 

e Disassemble molecules, splitting into fragments. 

e Assemble molecules from elements or sub molecules. 

e Apply reactions on input reactant structures, resulting in 
output of reaction product structures. 

e Generate molecular fingerprints. Fingerprints are bit- 
vectors where individual bits correspond to the presence 
or absence of structural features. The most important use 
of fingerprints is in indexing of chemistry databases[22]. 


some popular tools are: 

3) LogCHEM : Structural activity prediction of a small set 
of compound or drugs is one of the most important task in 
cheminformatic. LogCHEM is an inductive logic programming 
(ILP) based tool, used to mine effectively large cheminfor- 
matic datasets, an ILP is a sub field for machine learning and 
expert for explain chemical activity of components based on 
their structure and properties LogCHEM system is an expert 
tool used for discriminative interactive mining of the chemical 
fragment, it communicates through a better user interface. 
LogCHEM can input data from chemical representations, 
such as MDLsSDF file format, and displays molecules and 
matching patterns through tools such as VMD[23], and it have 
an ability to interface with external tool. 

LogCHEM is an ideal tool for mine the data patterns from 
large chemical dataset. Logical representation in many ways 
is a benefit for LogCHEM that is the representation is less 
compact. The advantage of LogCHEM are, 


e Both atoms and their location information can be store, 
it useful for interacting with external tool. 

e It supports macro structures. 

e It can support search algorithms implemented in ILP. 


4) MMsINC : MMSINC ideal for large database, it is a 
web-oriented database of commercially available compounds 
and free cost. It contains compound in 3D format. It has the 
storage capacity of more than 4 million of redundant chemical 
component. 

The steps of MMsINC are: 


1) First redundancy washing: The first redundancy washing 
step was removing duplicates and protonates state of the 
same molecule. 

2) 2.Tautomers generation: Tautomers are the important 
class of isomers that can interconvert under physiologi- 
cal conditions.Tautomers generation was performed with 
LigPrep tool implemented by Schrdinger suite[24]. 

3) 3.Ionic states generation : In this step calculated the most 
favorable ionic states for each molecular entry, including 
all calculated tautomers by using the Protonate utility 
implemented into MOE suite. This algorithm evaluates 
the most energetically favorable protonation state using 
the Generalized Born electrostatics model [25] 

4) 4. 3D-conformers generation: The three-dimensional 
structure of each molecule(including all tautomers and 
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Fig. 2. Flowchart of MMsINC Construction 


ionic states) was obtained by using Corina 3.4[26]. 

5) 5. Uniqueness hunting: This step is used to characterize 
uniqueness of each chemical structure. 

6) 6.Molecular descriptors prediction: This step calculated 
twenty-seven molecular properties useful for QSAR, di- 
versity analysis or combinatorial library design MMsINC 
can integrated with PubChem and PDB databases facili- 
tating the cross exchange of ligand information. 

5) KuChemBio : The experiment reproducibility and com- 
parison of method are need the chemical structure of the 


compound, but many publications do not publish the chemical 
structure used for experiment, KuChemical Biology Database 
(KuChemBio) can address this problem by established a col- 
lection of 69 data set for computational chemical experiments. 

In KuChemBio the chemical structure and their related 
biological activities are collected from different repository and 
host those data for easier access. KuChemBio contain data in 
different categories including ADME, toxicity, binding affinity, 
solubility, melting point mutagenicity etc[27]. 

6) ODDT : The Open Drug Discovery Toolkit is a free 
and open source tool, it is used for both computer aided 
drug discovery (CADD) developers and researchers. ODDT 
reimplements many state-of-the-art methods, such as machine 
learning scoring functions (RF-Score and NNScore) and wraps 
other external software to ease the process of developing 
CADD pipelines[28]. 

The Open Drug Discovery Toolkit is offer as a Python 
library to the cheminformatics community. Most convenient 
way of installing ODDT is using PIP. All required python 
modules will be installed automatically, although toolkits, 
either OpenBabel or RDKit need to be installed manually. 
Most important and handy property of Molecule in ODDT 
are Numpy dictionaries containing most properties of supplied 
molecule. Some of them are straightforward, other require 
some calculation, i.e. atom features. Dictionaries are provided 
for major entities of molecule: atoms, bonds, residues and 
rings. It was primarily used for interactions calculations, 
although it is applicable for any other calculation. The main 
benefit is marvellous Numpy broadcasting and sub setting. 


7) PubChem : As one of the largest publicly accessible 
databases for hosting chemical structures and biological ac- 
tivities, PubChem has been processing bioassay submissions 
from the community since 2004. With the increase in volume 
for the deposited data in PubChem, the diversity and wealth of 
information content also grows. Recently, the Tox21 program, 
has deposited a series of pairwise data in PubChem regarding 
to different mechanism of actions(MOA), such as androgen 
receptor (AR) agonist and antagonist datasets, to study cell 
toxicity. A little work has been reported from cheminformatics 
study for these especially pairwise datasets, which may pro- 
vide insight into the mechanism of actions of the compounds 
and relationship between chemical structures and functions, 
as well as guidance for lead compound selection and opti- 
mization. Thus, to fill the gap, we performed a comprehensive 
cheminformatics analysis, including scaffold analysis, matched 
molecular pair (MMP) analysis as well as activity cliff analysis 
to investigate the structural characteristics and discontinued 
structureactivity relationship of the individual dataset and the 
combined dataset . 


8) SMSD : SMSD is a Java based software library for 
calculating Maximum Common Subgraph (MCS) between 
small molecules. This enables us to find similarity/distance 
between two molecules. MCS is also used for screening drug 
like compounds by hitting molecules, which share common 
subgraph (substructure). 
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VII. CONCLUSION 


In this paper, we discussed about the various fields of chem- 
informatics. Drug discovery, Data storage, Data visualization 
are the major fields of cheminformatics. Machine learning, 
Virtual screening and Deep learning methods are used in drug 
discovery. Clustering algorithm, used in cheminformatics in 
drug discovery process. Cheminformatics Toolkits are software 
development kits that allow cheminformaticians to develop 
custom computer applications for use in virtual screening, 
chemical database mining, and structure-activity studies. This 
is a survey paper (2008 to 2018) of cheminformatics, where 


we used many journals to get more information about chem- 
informatics. 
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Abstract—The concept of deep learning originated from artifi- 
cial neural network research. Artificial neural networks (ANNs) 
or connectionist systems are computing systems inspired by the 
biological neural networks .Deep learning architectures such 
as deep neural networks, deep belief networks and recurrent 
neural networks have been applied to fields including computer 
vision, speech recognition, natural language processing, audio 
recognition, social network filtering, machine translation, bioin- 
formatics and drug design, where they have produced results 
comparable to and in some cases superior to human experts. 
Most of the upcoming technologies related to neural network 
uses deep learning. Here we present a survey that collects all the 
technologies that use deep learning to show that deep learning 
is un avoidable in many future applications. 

Index Terms—Deep_ Learning,Conventional Neural Net- 
work,Deep convolutional neural networks, Artificial neural net- 
works...etc. 


I. INTRODUCTION 


Deep learning is part of a broader family of machine learn- 
ing methods based on learning data representations, as opposed 
to task-specific algorithms. Deep learning architectures such 
as deep neural networks, deep belief networks and recurrent 
neural networks have been applied to fields including computer 
vision, speech recognition, natural language processing, au- 
dio recognition, social network filtering, machine translation, 
bioinformatics and drug design, where they have produced 
results comparable to and in some cases superior to human 
experts. 

Machine learning is a field of computer science that gives 
computer systems the ability to “learn” with data, without 
being explicitly programmed. 

Artificial neural networks (ANNs) or connectionist sys- 
tems are computing systems inspired by the biological neural 
networks of animal brains. Such systems learn to do tasks 
by considering examples, generally without task-specific pro- 
gramming. They have found most use in applications difficult 
to express with a traditional computer algorithm using rule- 
based programming. The term Deep Learning was introduced 
to the machine learning community by Rina Dechter in 1986, 
The first general, working learning algorithm for supervised, 
deep, feedforward, multilayer perceptrons was published by 
Alexey Ivakhnenko and Lapa in 1965. 1971 paper described 
a deep network with 8 layers trained by the group method of 
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data handling algorithm. 

The applications of deep learning In Deep Learning for RF 
Device Fingerprinting, CNN (Conventional Neural Network) 
techniques which provide state-of-the-art performance in im- 
age and speech recognition problems can be applied to the 
problem of RF (Radio Frequency) fingerprinting, the method 
relies on steady state analysis of the signal and achieves high 
identification and verification accuracy on 7 ZigBee devices, 
and also permits use of the full transmission for fingerprinting 
regardless of the data content of the signal. 

In face recognition system, deep convolutional neural net- 
works (DCNNs) have shown impressive performance im- 
provements on various object detection/recognition problems. 
With powerful graphics processing units (GPUs) and DCNN 
improved the capabilities of machines in understanding faces 
and automatically executing the tasks of face detection, pose 
estimation, landmark localization, and face recognition from 
unconstrained images and videos. 

Particle Filtering Applied to Lip Tracking introduces a 
new pattern recognition model for segmenting and tracking 
lip contours in video sequences. The computation of the 
expected segmentation is based on a filtering distribution This 
is a difficult task because one has to compute the expected 
value using the whole parameter space of segmentation. Using 
sequential Monte Carlo sampling methods, the combination of 
new transition and observation models, and a new proposal 
distribution, it provides accuracy and robustness to imaging 
conditions and drifting. In robotic autonomous navigation, 
Deep learning is selected to extract the feature inspired by 
the working pattern of the biological brain, human activities 
recognition in recent years, it recognizes human activities 
by deep learning (DL) algorithm, here collected the data 
from participants performing activities in order to evaluate the 
human activities recognition results. In the application novel 
approach for training a topological deep neural network with 
visual impression, it shows that by combing denoising auto- 
encoder model and contractive auto encoder with Hessian 
regularization model, we can achieve a deterministic auto- 
encoder aiming for robustness to small variations of the input. 
In the application Recognizing Human Activity in Smart 
Home Using Deep Learning algorithm, here collected the data 
from participants performing activities in order to evaluate the 
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human activities recognition results. 

The latest applications of deep learning include some 
futuristic technologies like automated car driving ,3D face 
recognition, hand gesture identification, underwater targeting, 
and human activity monitoring ..etc. The automated car driving 
system uses GLAD algorithm to sense the road and behave 
according to the traffic condition, this will reduce the human 
effort and reduces the accidents. The 3D face recognition uses 
the SDAE model for recognizing the face of individuals in an 
efficient manner rather than old 2D face recognition method. 
It improves the accuracy of face recognition and provide good 
security features. One of the other interesting technology is 
hand gesture identification, in this we use a neural network 
that is inspired by human brain called ANN that will identify 
the hand gestures that is made by the individuals and reacts 
according to that actions. It reduces the usage of hand held 
controlled devices. The underwater targeting is another tech- 
nology that is a good achievement for marine and naval forces, 
it uses the SDAE and DBN algorithm to track the underwater 
moving objects using different frequency signals. It allows a 
higher accuracy in identifying targets rather than old methods. 
One of the most useful technology for our home is the human 
activity monitoring. It allows us to track our routine and it is 
also helpful for security purposes. The deep learning methods 
are implemented in advanced technologies that may shape up 
the future of our world. 


II. INTRODUCTION TO DEEP LEARNING FOR VARIOUS 
APPLICATIONS 


An introduction to different applications that using deep 
learning concept is discussed here. 


A. Deep Learning and Its Applications to Signal and Infor- 
mation Processing 


Today,[1] signal processing research has a significantly 
widened its scope compared with just a few years ago, and 
machine learning has been an important technical area of the 
signal processing society. Since 2006, deep learning a new 
area of machine learning research has emerged, impacting a 
wide range of signal and information processing work within 
the traditional and the new, widened scopes. Many traditional 
machine learning and signal processing techniques exploit 
shallow architectures, which contain a single layer of nonlinear 
feature transformation. A property common to these shallow 
learning models is the simple architecture that consists of only 
one layer responsible for transforming the raw input signals 
or features Into a problem-specific feature space, which may 
be unobservable. Human information processing mechanisms 
(e.g., vision and speech), however, suggest the need of deep 
architectures for extracting complex structure and building 
internal representation from rich sensory inputs (e.g., natural 
image and its motion, speech, and music). For example, human 
speech production and perception systems are both equipped 
with clearly layered hierarchical structures in transforming 
information from the waveform level to the linguistic level 
and vice versa. It is natural to believe that the state of 


the art can be advanced in processing these types of media 
signals if efficient and effective deep learning algorithms are 
developed. Signal processing systems with deep architectures 
are composed of many layers of nonlinear processing stages, 
where each lower layers outputs are fed to its immediate higher 
layer as the input. The successful deep learning techniques 
developed so far share two additional key properties: the 
generative nature of the model, which typically requires an 
additional top layer to perform the discriminative task, and an 
unsupervised pretraining step that makes effective use of large 
amounts of unlabelled training data for extracting structures 
and regularities in the input features. 


B. An Introduction to Deep Learning for the Physical Layer 


Here[2] discuss several novel applications of deep learning 
(DL) for the physical layer. By interpreting a communications 
system as an autoencoder, develop a fundamental new way to 
think about communications system design as an end-to-end 
reconstruction task that seeks to jointly optimize transmitter 
and receiver components in a single process. Here shows how 
this idea can be extended to networks of multiple transmitters 
and receivers and present the concept of radio transformer 
networks (RTNs) as a means to incorporate expert domain 
knowledge in the machine learning (ML) model. Lastly, 
demonstrate the application of convolutional neural networks 
(CNNs) on raw IQ samples for modulation classification 
which achieves competitive accuracy with respect to traditional 
schemes relying on expert features. The paper is concluded 
with a discussion of open challenges and areas for future 
investigation. 

Communications is a field of rich expert knowledge about 
how to model channels of different types, compensate for 
various hardware imperfections, and design optimal signalling 
and detection schemes that ensure a reliable transfer of data. 
As such, it is a complex and mature engineering field with 
many distinct areas of investigation which have all seen di- 
minishing returns with regards to performance improvements, 
in particular on the physical layer. Because of this, there is 
a high bar of performance over which any machine learning 
(ML) or deep learning (DL) based approach must pass in order 
to provide tangible new benefits. 

In domains such as computer vision and natural language 
processing, DL shines because it is difficult to characterize real 
world images or language with rigid mathematical models. 
For example, while it is an almost impossible task to write 
a robust algorithm for detection of handwritten digits or 
objects in images, it is straightforward today to implement DL 
algorithms that learn to accomplish this task beyond human 
levels of accuracy. In communications, however, we can design 
transmit signals that enable straightforward analytic algorithms 
for symbol detection for a variety of channel and system 
models (e.g., detection of a constellation symbol in additive 
white Gaussian noise (AWGN)). Thus, as long as such models 
sufficiently capture real effects, we do not expect DL to yield 
significant improvements on the physical layer. 
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III. INDOOR APPLICATIONS 


In Door recognition and deep learning algorithm for visual 
based robot navigation [3], a new method based on deep 
learning for robotics autonomous navigation is used. Different 
from the most traditional methods based on fixed models, a 
convolutional neural network (CNN) modelling technique in 
Deep learning is selected to extract the feature inspired by the 
working pattern of the biological brain. This neural network 
model has multi-layered features where the ambient scenes 
can be recognized and useful information such as the location 
of door can be identified. The extracted information can be 
used for robot navigation, so does the robot can approach the 
target accurately. 

Navigation is one of the hot topics in robotic system area, 
which plays a key role in robotic intelligent and autonomous 
movement. The objective of navigation is to execute spe- 
cific tasks based on only environmental information without 
human intervention. Many navigation methods have been 
developed, i.e., inertial navigation, visual navigation, sensor 
data navigation, GPS navigation and satellite navigation. These 
methods can be applied to navigate robotic systems in various 
environments, including indoor and outdoor, structured and 
unstructured environments. Visual approach is an advanced 
navigation method developed in recent years. 

In the indoor navigation, it can be roughly classified into, 
map-based, map-building-based and map-less three types of 
navigation. As for the map-based navigation, combined visual 
odometer with the map model, the robotic can achieve indoor 
navigation. It also can process the node information of WIFI 
to calculate the location. The main method of map building- 
based navigation is SLAM (Simultaneous Location and Maps), 
which can achieve better performance in a limited space. But 
many challenges to be solved, Due to the various environ- 
ments and sensor constraints, a robot is difficult to complete 
navigation only via the visual information. It means that robot 
needs a powerful and robust perception system or algorithm 
to acquire accurate environmental information, i.e., path, door 
and obstacle, so that it can navigate itself in the complex 
environments. Another challenge is, robots can easily take the 
similar objects as the targets in a complex environment, which 
makes the robot with wrong response. 

Deep convolution neural networks have the ability to ac- 
commodate certain degree of transformation, deformation and 
illumination variation. It also has a light computational load 
for scanning the entire image to detect interested objects. 
Therefore, deep convolution neural network could be a good 
solution to meet the above challenges. It is based on unsuper- 
vised and supervised feature learning and has been widely 
used in many applications, where its structure is designed 
with multiply network layers to simulate learning process 
of human beings. The postures and location of the doors 
can be identified via the trained convolution neural networks 
(CNN) so that the robot can also be localized. So, CNN can 
improve the correction ratio for the door detection during robot 
autonomous navigation. 


Recognizing Human Activity in Smart Home is another 
application Using Deep Learning algorithms[4]. Because of 
the worlds population ages, smart homes have caused a great 
interest for many researchers with the aim of helping those 
who are suffering from diseases. An estimated 9% of adults 
age 65+ and 50% of adults age 85+ need assistance with the 
activities of Daily Living (ADLs). smart homes technologies 
can be deployed for those old people, several challenges 
should be solved. For this we need to collect data. For this 
choose activities that people do frequently in their daily life. 
The network has 4 hidden layers and has been pre-trained 
layer by layer using the algorithm called RBM. Then the fine- 
turning work is done using CG algorithm. The deep learning 
model is used to solving the problem about recognizing human 
activities. Then fine-tuning process begins. The scheme uses 
the sensors monitor motion, temperature, water, door, burner, 
and item use. All the motion sensors are located on the 
ceiling of the smart home Compared with Hidden Markov 
Model (HMM) and nave Bayes Classifier (NBC). From the 
experiment, results show that the proposed deep learning 
algorithm is an effective way for recognizing human activities 
in smart home. But there are still some challenges we must 
resolve, such as the number of the units in each layer, and the 
value of the epoch. These will serve as the research focus in 
the future. 

Human Physical Activity Recognition Based on Computer 
Vision is another application in Deep Learning Model [?]. 
Human activity recognition is an active research area in the 
computer science because it is widely used in the fields of 
the security monitoring, health assessment, human machine 
interaction and other human related content searching and it 
has obtained increasing attention because of its wide range 
of applications in the fields of health analysis, elderly health, 
information security and human machine interaction, etc. 

The human physical activity recognizes based on the skele- 
ton. Data of the human body from the sensor of Microsoft 
Kinect. This model uses the human skeletons data from the 
CAD-60 dataset to recognize the human physical activity 
without using any prior knowledge. It can reduce the works on 
the stage of data pre-processing and feature extraction. It can 
also improve the generalization performance and robustness 
of the model and give a better understanding of the human 
physical activity. Different tricks which can improve the per- 
formance of the neural networks, such as some regularization 
methods and other activation functions are tested. A model of 
the human physical activity recognition convolutional neural 
network+ multilayer perceptron+ Maxnorm is designed, and 
corresponding experiment achieved 81.8% result. The model 
can identify a sequence of motion skeleton data and understand 
the meaning of bodys actions, rather than a single moment 
picture of a skeleton. 


IV. SIGNALING APPLICATIONS 


For many years, radio signal classification and modulation 
recognition have been accomplished by carefully hand-crafting 
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specialized feature extractors for specific signal types. prop- 
erties and deriving compact decision bounds from them using 
either analytically derived decision boundaries or statistical 
learned boundaries within low dimensional feature spaces. In 
depth study on the performance of deep learning-based radio 
signal classification for radio communications signals [5], DL 
methods continue to show enormous promise in improving 
radio signal identification sensitivity and accuracy, especially 
for short-time observations. deep networks to be increasingly 
effective when leveraging deep residual architectures and 
have shown that synthetically trained deep networks can be 
effectively transferred to Over-The-Air (OTA) datasets with a 
loss of around 7% accuracy or directly trained effectively on 
OTA data if enough training data is available. 

The deep learning is also used in submarine or underwater 
environment applications. The main aspects of underwater 
target recognition are to extract and express the features 
of underwater acoustic signals. The traditional recognition 
methods rely on extracting some features. The composition 
of underwater signal is very complex the rapid development 
of computer technology, common machine learning algorithms 
such as neural network, support vector machine and maximum 
entropy method have been introduced into underwater target 
recognition, because it is more complex .The deep belief 
network (DBN) and stacked denoising autoencoder (SDAE) 
are used to recognize underwater targets .The features were 
extracted from underwater target radiated signal by dimension 
reduction based on the above deep learning models, and then 
the target recognition can be realized. The underwater target 
radiated noises have the features of complex composition to 
reduce the computational complexity of the deep learning al- 
gorithm and improve the recognition efficiency, the underwater 
acoustic data is pre-processed, and the feature is extracted 
from the target signal. The support vector machine (SVM), 
general regression neural network (GRNN) and probabilistic 
neural network (PNN) were used for contrast recognition. 
The underwater acoustic data of different targets as well 
as the underwater acoustic data of one target with different 
navigation states were identified. The long distance and other 
types of targets are identified by adjusting the frequency and 
amplitude of detection waves. The SDAE and DBN algorithm 
can effectively improve the classification accuracy of the 
underwater targets. 


V. VISUAL APPLICATIONS 


Here we discuss some major applications related to visual. 


A. Deep Learning Algorithm with Visual Impression 


The effectiveness of human visual system has long been 
studying to uncover the cognitive patterns of humans by 
combining visual perception with mechanisms of cognition to 
simulate the processing information model of human brains 
When exposed to the rapidly changing world, human brains 
are filled with information from all the sensory organs. This in- 
formation is changed to various specific patterns via cognition, 
and they are stored in brains in the shape of memories, which 


help humans to understand. Statistics show that during the 
human perception process, 80 percent of cognitive information 
comes from vision. Among numerous impressions formed by 
human brains, visual impression plays an important role in the 
cognition process. In visual impression model, we show how 
image representations learned on large-scale annotated datasets 
can be efficiently transferred to other visual recognition tasks 
with limited amount of training data. By using recognition 
visual impression model and generalization visual impression 
model, we can reuse the mid-level features of visual impres- 
sion to another dataset for recognition process. 

Image representations learned with sparse auto encoder on 
a very large number of annotated image samples can be 
efficiently transferred to new visual recognition tasks with 
limited amount of training data. We design a new network 
structure to reuse the visual impressions in different abstract 
levels which are computed using an existing network. We 
show that the transferred visual impressions for images in the 
previous dataset can lead to significantly improved results for 
object classification. 


B. Image Recognition Method 


Deep learning is widely applied in traditional artificial 
intelligence domains, the deep learning methods are classified 
as Convolutional Neural Networks (CNNs), Restricted Boltz- 
mann Machines (RBMs), Auto encoder and Sparse Coding. 
A CNN consists of three main neural layers, which are con- 
volutional layers, pooling layers, and fully connected layers. 
Different kinds of layers play different roles. For[6] image 
recognition the CNN-based methods require a fixed-size input 
image. This restriction may reduce the recognition accuracy 
for images of an arbitrary size. To eliminate this limitation, 
we use CNN architecture but replaced the last pooling layer 
with a spatial pyramid pooling layer. 

The spatial pyramid pooling can extract fixed-length repre- 
sentations from arbitrary images, generating a flexible solution 
for handling different scales, sizes, aspect ratios but there 
is a problem of deformation in image recognition to deal 
with deformation more efficiently, Ouyang et al. introduced a 
new deformation constrained pooling layer, called def-pooling 
layer, to enrich the deep model by learning the deformation 
of visual patterns. We also uses the Restricted Boltzmann 
Machines (RBMs), Auto encoder and Sparse Coding ,deep 
energy model for the image recognition CNN based schemes 
as it is the most extensively utilized and most suitable for 
image recognition. 


VI. HUMAN AND MACHINE INTERACTION 


Machines can interact with humans in different ways by 
using hand held devices, sensors, camera and other devices, 
through these machines can identify individuals and it can 
react according to the human behaviour by learning human 
physic and gestures or facial patterns through these machines 
become more interactive to the user. There are several human 
interactive technologies that uses deep learning techniques 
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A. Recognition and Authentication 


First we can discuss about face recognition technology 
face recognition compares known peoples face images in 
data base with one person or lots peoples face images tested 
in still images or dynamic videos, so it can get the goal 
of recognizing people in scenes. Face recognition methods 
based on images or videos basically use 2D images. The new 
technology in this field is [7]3D face recognition. This method 
is basically divided into three parts: recognition algorithm 
based on spatial matching, recognition algorithm based on 
local feature matching and recognition algorithm based on 
overall feature matching. The 3D face algorithm based on 
deep learning SDAE theory. Face is made up of non-flat areas 
and flat areas in deep direction, non-flat areas contain mouth, 
nose, eyes, eyebrows, the border of skull and so on and flat 
areas contain forehead, cheekbone and etc, comparing with flat 
areas, non-flat areas which play a key role in the process of 
face recognition have a more detailed characterization of subtle 
differences among the different faces. The 3D face recognition 
uses infrared cameras to collect thousands of face points the 
SDAE models is used to analyse and compare facial features 
and identifies the individual. Other deep learning algorithm 
for face recognition is by using DCNNs ,It have been shown 
to be effective for image analysis and face recognition. The 
process can be divided into Region based and sliding window 
based both uses different views or scenarios and key points and 
angles for identification. It first recognizes the face area and 
an analysis is performed using Metric learning and compare 
the faces along with previously stored images and identify the 
person and also verify it. 


B. Tracking and control 


Nowadays the way to interact with computers and other 
devices is changing to a more human like communication. 
Hands helps us for such communication and hand gesture 
tracking can play an important role in all these systems. 
There are several problems in making a gesture technol- 
ogy, high dimensionality problem, self-occlusions, processing 
speed, uncontrolled environments, and rapid hand motion. Two 
approaches are proposed to solve these hand gesture track- 
ing problems they are discriminative methods and generative 
methods and there are two main problems that make hand 
gesture tracking especially difficult, one is the great number 
of degrees of freedom of the hand and the other one is the 
rapid movements that we make in natural gestures 

To overcome this problem a novel algorithm that combine 
some properties of discriminative and generative methods is 
the convolutional neural network (CNN). A convolutional 
neural network can be seen as an evolution classical artificial 
neural network (ANN) with an additional step where some 
pre-processing is done. ANNs is a system inspired in human 
brain. The pre-learned gestures is done by using deep learning 
techniques, where high average recognition rate is achieved. 
The tracking technology that uses deep learning in latest 
is [8]Lip tracking. Lip tracking is important in audio-visual 
speech recognition systems, such as the automatic speech 


recognition (ASR), which has been widely used in mobile 
phones and car environments. Most ASR systems have concen- 
trated on the acoustic speech signal, which means that they are 
susceptible to acoustic noise. By incorporating automatic lip 
reading from visual information can improve the performance 
of these systems. The problem that make lip tracking difficult 
is high variability of the shapes, colors and etc. By using a 
Sequential Monte Carlo sampling algorithm and MMDA we 
can easily overpass the problems and made lip tracking more 
easier for supporting ASR Efficiency 


VII. OTHER APPLICATIONS 


Deep learning is also applicable in [9]fibre optics sys- 
tems in here we consider a fibre optic system with sensors 
.Distributed fibre optic vibration sensors are used in many 
areas, fibre optic sensors are very common in monitoring 
and security of long perimeters. One of the types of fibre 
optic systems operates based on time-domain detection of 
backward Rayleigh scattered light of short laser pulse injected 
into the cable. Vibrations in the environment, makes the sensor 
operates, because micro-vibrations in the optic fibre of the 
sensor displacing Rayleigh scattering centres relative to each 
other. As a result, spatially localized disturbances of the sensor 
cause local modulation of coherence reflectogram intensity. 
Analysing time-frequency and spatial features of the intensity 
of local modulation in the coherence reflectogram allows to 
build algorithms for detection and classification of vibration 
and acoustic fields disturbances in the vicinity of monitored 
perimeters but there are some major problems that causes the 
change of measurements. Therefore a considerable depth of 
adaptability is essential to efficient and reliable operation of 
the recognition algorithms. To solve this problem, a distributed 
fiber optic sensor system using deep learning algorithms based 
on an ensemble of convolutional neural networks(CNN) is 
used. To test the algorithm, real-life signals and jamming under 
the conditions similar to normal modes of operation of the 
instrument and simulating different weather conditions. The 
approach to developing signal recognition algorithms based 
on deep learning can be successfully extended to other appli- 
cations including radiolocation, hydro acoustics, supersonic or 
magnetic scanning of materials, etc. 

The other technology that uses the deep learning is Digital 
finger print authentication or recognition .In this paper we are 
discussing about digital finger print authentication in the area 
of Cognitive Radio Networks. In this a spectrum can be used 
by two type of users, primary users and secondary users .The 
primary users have highest priority for transmission, though 
they only occupy their frequency bands for finite periods 
of time. Secondary users perform spectrum sensing and fill 
in temporarily unoccupied frequencies so that they do not 
interfere with the operation of primary users the attackers 
uses a primary user emulation technique in order to steal 
the spectrum usage .To prevent this we use radio frequency 
finger printing, it aims to identify transmitters by characteriz- 
ing device-specific features present in their emitted analogue 
signals. These features are primarily a result of hardware 
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variability in the devices analogue transmitter components .the 
CNN deep learning technique was use full for identification of 
the each and individual primary or secondary users and their 
devices identity as fingerprint and also their paid and allotted 
spectrum this prevented the attackers using emulations to fake 
the devices of primary users and provided a good security to 
the spectrum 


Another area that uses deep learning is [10] autonomous 
driving. Early efforts in autonomous driving research and 
recent studies use various perception control and decision 
challenges in this area. The early deep learning method 
using convolutional neural networks (CNN) shows accurate 
extracting features from visual data that leads to apply this 
method for self-driving vehicles. 


There are 3 general approaches used for autonomous driving 
they are perception approach, behavior reflex approach, direct 
perception approach. For testing these approaches an open 
racing car simulator is used to simulate road and highway 
driving conditions the ConVnet algorithm provide superior 
result compared to other algorithms that tested .It uses several 
hardware such as camera ,radar ,sensors combined with micro 
controller. The other algorithm fails when certain conditions 
occur in the road but the ConVnet algorithm passes all that 
problems easily. The ConVnet is made from Alexnet that is a 
shallowest CNNs. 


The improvised algorithm that performs well above the 
ConVnet is the GoogLeNet for autonomous driving (GLAD).It 
is more realistic algorithm for direct perception .The algorithm 
senses highway road objects such as cars, lanes..etc. By 
several sensing devices to make accurate decisions .To test 
which algorithm is best a comparison is done between best 
CNN algorithms and GoogLeNet showed better result in less 
error. GoogLeNet has 22 convolutional layers so it is trained 
with a more complex data and complex ways rather than 
other algorithms. The GoogLeNet algorithm provides a good 
environment sensing and decision making rather than other 


algorithms. 


VIII. CONCLUSION 


In this paper we explained that deep learning has been 
applied to fields including robotics, signalling, recognition and 
other applications. First, we collected all the papers related to 
deep learning from 2008 to 2018 and divided it in categories 
and made a sum up paper that includes all the technologies that 
uses deep learning in different areas. This paper does not cover 
all the published papers, only major topics are considered here. 
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Abstract—To test web applications, developers currently write 
test cases in frameworks such as Selenium[6]. On the other hand, 
most web test generation techniques rely on a crawler to explore 
the dynamic states of the application. The first approach requires 
much manual effort, but benefits from the domain knowledge of 
the developer writing the test cases. The second one is automated 
and systematic, but lacks the domain knowledge required to be 
as effective. Dynamic symbolic execution has been shown an 
effective technique for automated test input generation. However, 
its scalability is limited due to the combinatorial explosion 
of the path space and the high cost of computation. Several 
sophisticated search strategies have been proposed to better guide 
dynamic symbolic execution towards achieving high code cover- 
age. We propose DES-BASED approach for better performance 
of dynamic symbolic execution in the context of generating test 
inputs for maximum structural coverage. The development of 
defect tests is still a very labor intensive process that demands 
a high-level of domain knowledge, concentration and problem 
awareness from software engineers. Any technology that can 
reduce the manual effort involved in this process therefore has 
the potential to significantly reduce software development costs 
and time consumption. 

Index Terms—Automated Test Generation,Test Reuse,Test Ad- 
equacy criteria,DOM,Coverage,Web Applications,Dynamic Sym- 
bolic Execution,Software Testing. 


I. INTRODUCTION 


In this paper,web applications have become one of the 
fastest growing types of software systems today. Testing mod- 
ern web applications is challenging since multiple languages, 
such as HTML, JavaScript, CSS, and server-side code, interact 
with each other to create the application. The final result 
of all these interactions at runtime is manifested through 
the Document Object Model (DOM)[8] and presented to the 
end-user in the browser. To avoid dealing with all these 
complex interactions separately, many developers treat the 
web application as a black-box and test it via its manifested 
DOM, using testing frameworks such as Selenium[7]. These 
DOM based test cases are written manually, which is a tedious 
process with an incomplete result. 

Although crawling-based techniques automate the testing to 
a great extent, they are limited in three areas: 

e Input values: Having valid input values is crucial for 
proper coverage of the state space of the application. 
Generating these input values automatically is challeng- 
ing since many web applications require a specific type, 
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value, and combination of inputs to expose the hidden 
states behind input fields and forms. 

e Paths to explore: Industrial web applications have a huge 
state space. Covering the whole space is infeasible in 
practice. To avoid unbounded exploration, which could 
result in state explosion, users define constraints on the 
depth of the path, exploration time or number of states. 
Not knowing which paths are important to explore results 
in obtaining a partial coverage of a specific region of the 
application. 

e Assertions: Any generated test case needs to assert the ap- 
plication behavior. However, generating proper assertions 
automatically without human knowledge is known to be 
challenging. As a result, many web testing techniques 
rely on generic invariants [5]or standard validators[1] to 
avoid this problem. 


In this paper, we propose three main points. 


1) Mine the human knowledge existing in manually-written 
test cases. 

2) Combine that inferred knowledge with the power of 
automated crawling. 

3) Extend the test suite for uncovered/unchecked portions of 
the web application under test. 


We present our technique and tool called Testilizer [9], 
which given a set of Selenium test cases TC and the URL of 
the application, automatically infers a model from TC, feeds 
that model to a crawler to expand by exploring uncovered 
paths and states, generates assertions for newly detected states 
based on the patterns learned from TC, and finally generates 
new test cases. In this paper we propose a set of DOM-based 
test adequacy criteria for web applications. These criteria 
aim at measuring web application coverage at two different 
granularity levels. 


e The percentage of all DOM states covered in the total 
state space of the application. 

e The percentage of all elements covered in a particular 
DOM state. 


In general, our goal is not to replace code coverage but to 
complement it with DOM coverage, a metric more tangible 
for web developers and testers. We present a technique that 
automatically extracts and measures the proposed DOM-based 
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adequacy criteria and generates a visual DOM coverage report. 
This report helps web developers to spot untested portions of 
their web applications. Our work makes the following main 
contributions: 


e A set of test adequacy criteria [6] targeting the DOM at 
two granularity levels, namely: 


1) Inter-state criteria focus on the overall state space of 
the application and require each DOM state/transition 
to be covered at least once. 

2) Intra-state criteria examine each covered DOM state 
separately and require each DOM element in the state 
to be covered at least once. 

e A technique and algorithm to dynamically compute the 
coverage criteria. 


Testing is a widely adopted technique to ensure software 
quality in software industry. For about 50 % of the total 
software project costs are devoted to testing. However, it 
is labour-intensive and error-prone. An attempt to alleviate 
those difficulties of manual testing is to develop techniques 
to automate the process of generating test inputs. The various 
kinds of testing proposed, random testing , although often 
considered to be less effective, is a preferred automated test 
input generation technique in practice. The distinguishing 
feature of random testing is its high precision and scalability. 
Random testing is independent of the complexity or the size of 
the program. Test inputs is performed by randomly sampling 
the input space. Random testing therefore, obviously cannot 
help achieve high code coverage. Another automated test input 
generation technique is symbolic execution [4]. In contrast to 
random testing, it is a static program analysis based technique. 
The key idea behind symbolic execution is to use as input 
values symbolic values instead of actual data, and to represent 
values of program variables as symbolic expressions. The 
scalability of symbolic execution is limited due to complex 
constraints, data structures and native calls of real world 
programs. To intertwine the strengths of random testing and 
symbolic execution, dynamic symbolic execution has recently 
been proposed to automate unit testing of software. 

Among these techniques, dynamic symbolic execution has 
been gaining a considerable attention in the current industrial 
practice (Cadar et al., 2011). It intertwines the strengths of 
random testing and symbolic execution to obtain the scalability 
and high precision of dynamic analysis, and the power of 
the underlying constraint solver. One of the most important 
insights of dynamic symbolic execution is the ability to reduce 
the execution into a mix of concrete and symbolic execution 
when facing complicated pieces of code, which are the critical 
obstacle to pure symbolic execution. While effective, the 
fundamental scalability issue of dynamic symbolic execution is 
how to handle the combinatorial explosion of the path space, 
which is extremely large or infinite in sizable and complex 
programs. 

In fact, covering all feasible paths of the program is 
impractical. Besides, testing large programs and referring 
to sophisticated criteria can often be out of the limit of a 


typical testing budget. In the practice of software development, 
therefore, high code coverage has been long advocated as 
a convenient way to assess test adequacy . Specifically, the 
testing process must ensure every single code element in the 
program is executed for at least once. In this context, dynamic 
symbolic execution can be conducted so as to cover all code 
elements rather than exploring all feasible program paths. This 
may lead to a significant reduction in the number of paths 
needed to explore. 

Over the last decade the viability of software search engines 
especially for code has received a tremendous boost due 
to the emergence of large open source software repositories. 
repositories. This in turn has provided the foundation for 
a new generation of recommendation tools, especially code 
recommendation sys-tems, aimed at accelerating the software 
development process by obviating the continual need to rein- 
vent the wheel. All of these recommendation systems have 
the same aim to automatically recommend generated or 
reusable code artifacts of different sizes (i.e. from method call 
statements to full sized Java classes) for use in constructing 
new software products, based on the information in their 
repositories. A typical example is to suggest how classes from 
a class library should be used based on the way they are 
used in existing code. In terms of tool support for testing, 
the current generation of IDE recommendation tools focuses 
on increasing the quality of tests ex post, i.e. by using 
various criteria to judge the tests written by a developer 
after they were written. Our currently published repository 
contains approximately 200, 000 JUnit test files that can be 
used to build a test search engine to support a kind of ex 
ante recommendation tool for software tests. In this paper, 
we have highlighted automated test input generation and it 
describes tools and techniques for supporting test development 
by automatically generating recommendations system. Section 
2 includes related work based on this paper. Third section 
include Approaches and Recommendations. In the next section 
we briefly survey related test input generation techniques based 
on dynamic symbolic execution and our proposal towards 
answering the research questions. 


Il. RELATED WORK 


Based on this work, Sprenkle et al. propose a tool to 
generate additional test cases based on the captured user- 
session data. McAllister et al. leverage user interactions for 
web testing. Their method relies on prerecorded traces of 
user interactions and requires instrumenting one specific web 
application framework. None of these techniques considers 
leveraging knowledge from existing test cases as Testilizer 
does. 

Similarly, Xu et al mine executable specifications of web 
applications from Selenium test cases to create an abstraction 
of the system. Yuan and Memon propose an approach to 
iteratively rerun automatically generated test cases for gen- 
erating alternating test cases. This is in line with feedback 
directed testing, which leverages dynamic data produced by 
executing the program using previously generated test cases. 
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For instance, Artemis is a feedback-directed tool for automated 
testing of JavaScript applications that uses generic oracles such 
as HTML validation. These approaches, however, do not use 
information in existing test cases, and they do not address the 
problem of test oracle generation. Yoo and Harman propose 
a search-based approach to reuse and regenerate existing test 
data for primitive data types. They show that the knowledge 
of existing test data can help to improve the quality of new 
generated test data. Alshahwan and Harman generate new 
sequences of HTTP requests through a def-use analysis of 
server-side code. Pezze et al. present a technique to generate 
integration test cases from existing unit test cases. Mirzaaghaei 
et al. use test adaptation patterns in existing test cases to 
support test suite evolution.This work is also related to test 
suite augmentation techniques used in regression testing. In 
test suite augmentation the goal is to generate new test cases 
for the changed parts of the application. More related to 
our work is which aggregates tests generated by different 
approaches using a unified test case language. They propose a 
test advice framework that extracts information in the existing 
tests to help improve other tests or test generation techniques. 

Our work is different from these approaches in that we (1) 
reuse knowledge in existing human-written test cases in the 
context of web application testing, (2) reuse input values and 
event sequences in test cases to explore alternative paths and 
news states of web application, and (3) reuse oracles of the test 
cases for regenerating assertions to improve the fault finding 
capability of the test suite. 

This paper makes the following four main contributions: 


e It introduces a search heuristic for automated test input 
generation. The proposed technique balances the ex- 
pensiveness of computation costs to achieve high code 
coverage. 

e It conducts two evaluations to measure the effectiveness 
of the technique proposed through which it gives new 
insights about theoretical expectations and practical per- 
spectives when using dynamic symbolic execution in real 
world programs. 

e A coverage-driven DSE-based testing framework is pro- 
posed. It supports different (structural or logical) coverage 
criteria through the unified coverage structure. It is easy 
to be implemented on existing DSE-based tools with 
different underlying path exploration strategies. 

e In this framework, a path filtering algorithm and a new 
path exploration strategy are implemented to achieve 
faster coverage-driven testing with lower testing cost. 


A. A Search Engine for Tests 


Before we describe how automated test recommendation can 
be achieved in a modern software development environment 
like Eclipse, in this section we give a brief overview of 
our test case search engine SENTRE that drives the test 
recommendation tool. This information is stored in the index 
and updated during a second analysis in which additional 
information from dependencies may be gathered and stored. 
After the index has been initially created, the test-driven search 


Fig. 1. 


Processing view of our approach 


technology is utilized to build a cluster of semantically related 
test cases. 


II. APPROACHES 


Figure depicts an overview of our approach. At a high level, 
given the URL of a web application and its human written 
test suite, our approach mines the existing test suite to infer a 
model of the covered DOM states and event-based transitions 
including input values and assertions (blocks 1, 2, and 3). 
Using the inferred model as input, it explores alternative paths 
leading to new DOM states, thus expanding the model further 
(blocks 3 and 4). Next it regenerates assertions for the new 
states, based on the patterns found in the assertions of the 
existing test suite (block 5), and finally generates a new test 
suite from the extended model, which is a superset of the 
original human-written test suite (block 6). We discuss each 
of these steps in more details in the following subsections. 


1) Mining Human-Written Test Cases 
To infer an initial model, in the first step, we 


e Instrument and execute the human-written test suite 
T to mine an intermediate dataset of test operations. 

e Using this dataset, we run the test operations to infer 
a state-flow graph. 

e By analyzing DOM changes in the browser after the 
execution of each test operation. 


Instrumenting and executing the test suite: We instrument 
the test suite (block 1 Figure 1 ) to collect information 
about DOM interactions such as elements accessed in 
actions (e.g., clicks) and assertions as well as the structure 
of the DOM states covered. 

The instrumentation hooks into any code that interacts 
with the DOM in any part of the test case, such as 
test setup, helper methods, and assertions. Note that this 
instrumentation does not affect the functionality of the 
test cases Constructing the initial model :We model a web 
application as a State-Flow Graph (SFG) that captures 
the dynamic DOM states as nodes and the event-driven 
transitions between them as edges. 

2) Exploring Alternative Paths 
At this stage, we have a state-flow graph that represents 
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the covered states and paths from the human-written test 
suite. In order to further explore the web application to 
find alternative paths and new states, we seed the graph 
to an automated crawler. The exploration strategy can be 
conducted in various ways: 


e Remaining close to the manual-test paths. 
e Diverging from the manual-test paths. 
e Randomly exploring. 


However, in this work, we have opted for the first option, 
namely staying close to the manual-test paths. The reason 
is to maximize the potential for reuse of and learning from 
existing assertions. Our insight is that if we diverge too 
much from the manual-test paths and states, the human 
written assertions will also be too disparate and thus less 
useful. 
To find alternative paths, events are automatically gen- 
erated on DOM elements and if as a result the DOM 
is mutated, the new state and the corresponding event 
transition are added to the SFG. Note that the state 
comparison to determine a new state is carried out via 
the same state abstraction function used before. 

3) Regenerating Assertions 
The next step is to generate assertions for the new 
DOM states in the extended SFG . In this work, we 
propose to leverage existing assertions to regenerate new 
ones. By analyzing human-written assertions we can infer 
information regarding. 


e Portions of the page that are sidered important for 
testing. Therefore, extracting patterns from existing 
assertions may help us in generating new but similar 
assertions. 

e Patterns in the page that might be part of a template. 


4) Test Suite Generation 

In the final step, we generate a test suite from the 
extended state-flow graph. Each path from the Index 
node to a sink node (i.e., node without outgoing edges) 
in the SFG is transformed into a unit test. Loops are 
included once. Each test case captures the sequence of 
events as well as any assertions for the target states. 
To make the test case more readable for the developers, 
information (such as tag name and attributes) about re- 
lated DOM elements is generated as code comments. After 
generating the extended test suite, we make sure that 
the reused/regenerated assertions are stable, i.e., do not 
falsely fail, when running the test suite on an unmodified 
version of the web application. Some of these assertions 
are not only DOM related but also depend on the specific 
path through which the DOM state is reached. Our 
technique automatically identifies and filters these false 
positive cases from the generated test suite. This is done 
through executing the generated test suite and eliminating 
failing assertions form the test cases iteratively, until all 
tests pass successfully. 


Figure 2 sketches an abstract structure of the proposed 
approach, which consists of three basic steps to perform 


a 
2. Coverage 
gee DSE 
Repository 
3. Random Branch 


Search 


Measurements 


Fig. 2. The three steps of the proposed approach 


dynamic symbolic execution. 
Step 1: The test program is executed on a randomly gen- 
erated test input. The results of dynamic symbolic 
execution performed in this execution are saved into 
the repository and the program coverage is updated. 
The approach exploits the recently updated coverage 
information to guide dynamic symbolic execution. 
Specifically, the approach looks for a conditional 
statement for which either the then branch is not 
covered but the else branch was already explored, 
or vice versa. If such a conditional statement exists, 
the approach extracts a dynamic symbolic execution, 
which already executed the explored branch, from 
the repository. 

The approach performs random branch search. It 
extracts the last executed dynamic symbolic execu- 
tion and randomly selects a branch to be flipped for 
exploring new coverage. As soon as a new branch is 
found in this step, the approach reverts back to the 
second step to actively expose further coverage. 


Step 1: 


Step 1: 


IV. AUTOMATED GENERATION OF TEST 
RECOMMENDATIONS 


To be able to create a recommendation system for test 
cases, we draw upon our index of analyzed test cases.The 
traditional software search engines do a forward search in 
which they try to find results based on the provided interface 
of a class, we do a kind of reverse search that enables us to 
search for the required interface of a class and thus to find test 
cases by specifying the interface of the class under test. The 
term reverse search is intended to capture the similarities to 
traditional code reuse on the one hand, but also to capture the 
idea that this approach in some sense reverses the approach of 
test-driven reuse, where the tests are not the result of a search 
but the query. 

As an example, we consider the scenario of a developer 
writing a part of a system which needs to be tested before 
it can be integrated into a larger system context. In an 
ordinary development environment one would need to write 
the full test cases by hand. However, our environment should 
allow developers to avoid much of this effort by suggesting 
tests based on initial test cases already written by the user. 
Since the tool works non-intrusively in the background and 
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Fig. 3. Process Outline for Test Recommendation and Reuse 


smoothly integrates into normal working environments, the 
developers normal working practices are not disturbed and 
they only need to break away from the task of writing new 
test cases to consider already existing tests suggested by the 
recommendation engine. 

The fundamental process of acquiring a test recommenda- 
tion steps is described follows: 


1) The process starts once the developer has written a class 
and starts to write tests that as well as describing how to 
test the class also describe its behavior (i.e. its semantics). 

2) Subsequently the recommendation system performs a 
reverse search for test cases that require a similar interface 
to the one provided by the class under test and judges 
their fitness for purpose by executing the developers test 
against the CuTs associated with the tests acquired by the 
search. 

3) Tests that pass this step identify possible test recommen- 
dations which are ranked. 

4) Delivered to the users IDE. Since the tests fitness for 
purpose is also judged by the behavior of their associated 
CuT, it is not necessary that the interface they test 
matches the interface of the developers CuT; this can be 
dynamically adapted. 

5) By reusing a test and abandoning other recommendations, 
the developers provide feedback to the system, which 


should be automatically analyzed and used to improve 
the recommendation engines future precision. 


V. AN ECLIPSE PLUG-IN FOR TEST RECOMMENDATION 


In this section we present a test recommendation plug-in 
for Eclipse Test Tenderer which has been developed within 
our group and is based on the lessons learned during the 
development of the Code Conjurer semantic search plug-in. 
our goal is to suggest useful test cases to developers and help 
them to write better tests for the software they are developing. 
Such an approach is only of value if its application does 
not demand more effort and time than the original approach. 
Therefore, as with traditional code recommendation systems, it 
is important that such a system does not require developers to 
significantly change their traditional behavior while creating 
software. It should avoid to generate further overhead by 
requiring developers to write any additional specifications or 
learn new query languages and thus our vision is to extract 
all necessary information from the code under development, 
including the main functional software that will be part of the 
final product and the test cases that will be used to test it. 


VI. EVALUATION 


To evaluate the effectiveness of our approach, we chose the 
Vim editor to conduct experiments. Vim 5.7 contains roughly 
150K lines of C code and, after instrumented by CREST, 
39166 branches. The size of this selected test subject is large 
enough to consider the impact of computation costs on the 
effectiveness of search techniques. For comparison methods, 
we chose the mentioned two search heuristics, CfgDirected- 
Search and RandomBranchSearch, implemented in CREST. 
These heuristics are confirmed far more effective than the 
others like bounded depth-first search and random testing [2]. 
CfgDirectedSearch is a sophisticated technique that improves 
code coverage by solving control dependencies. In contrast 
RandomBranchSearch is a simple random branch flipping 
technique; it does not take into account any coverage guidance. 

To assess the efficacy of our proposed technique, we have 
conducted a controlled experiment to address the following 
research questions: 


RQ1: How much of the information (input data, event se- 
quences, and assertions) in the original human-written 
test suite is leveraged by Testilizer? 

RQ2: How successful is Testilizer in regenerating effective 
assertions? 

RQ3: Does Testilizer improve coverage? 

RQ4: Is there a correlation between code-based and DOM- 
based test adequacy criteria? 

RQS5: Is DomCovery effective in helping testers identify 
covered and untested portions of a web application 
under test? 

RQ6: Among the large program path space, what techniques 
are effective to direct dynamic symbolic execution 
towards achieving high code coverage? 

RQ7: As the cost of analyzing and executing large programs 


becomes expensive, what techniques are cost-effective 
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to guide dynamic symbolic execution to achieve high 
code coverage? 

RQ8: Based on the traditional DSE-based approach 
(DFS-based search strategy), how much does the ef- 
ficiency increase with the help of our coverage-driven 
testing framework (with path filtering)? 


The following is a summary of the findings. 


RQI: 


RQ?: 


RQ3: 


RQ4: 


Original SFG Coverage 

As expected, the number of states, transitions, and 
generated test cases are higher inTestilizer. The random 
exploration (RAND) on average generates fewer states 
and transitions,but more test cases compared to the 
original test suite.This is mainly due to the fact that 
in the SFG generated by RAND,there are more paths 
from index to the sink nodes than in the SFG mined 
from the original test suite. 

Regarding the usage of original test suite information 
(RQI), as expected Testilizer, which leverages the 
event sequences and inputs of the original test suite, 
has almost full state (98 % ) and transition (96 % ) 
coverage of the initial model. By analyzing the gen- 
erated test suites, we found that on average, Testilizer 
reused 22 input values (in addition to the login data) 
from the average of 15 original inputs. The RAND 
exploration approach covered about 60 % of the states 
and transitions, without any usage of input data (apart 
from the login data, which was provided to RAND 
manually). The difference between the number of ac- 
tual generated assertions and the stable ones reveals 
that our generated assertions (combined, similar/exact 
generated) are more stable than the random approach. 
Fault detection 

It depicts a comparison of fault detection rates for 
the different methods. It shows that exact and similar 
generated assertions are more effective than original 
and reused ones. The effectiveness of each assertion 
generation technique solely is not more than the ran- 
dom approach. This is mainly due to the fact that the 
number of random assertions per state is more than 
the assertions reused/generated by Testilizer, since we 
always select 5 random assertions at each state from a 
pool of assertions but not always find 5 exact/similar 
match in a state. 

Code Coverage 

Although code coverage improvement is not the main 
goal of Testilizer [9] in this work, the generated test 
suite has a slightly higher code coverage.There is a 30 
% improvement (6 % increase) over the original test 
suite and 18 % improvement (4 % increase) over the 
RAND test suite. Note that the original test suites were 
already equipped with proper input data, but not many 
execution paths (thus the slight increase). On the other 
hand, the random exploration considered more paths 
in a blind search, but without proper input data. 
Correlation Analysis 


ROS: 


RQ6: 


RQ7: 


RQ8: 


To address RQ4, we measured correlations between 
clientside JavaScript code coverage metrics i.e., func- 
tion, statement, branch coverage and DOM coverage 
criteria i.e., DSC (state), DTC (transition), EEC (ex- 
plicit element), AEC (actionable element), IEC (im- 
plicit element), and CEC (checked element) coverage. 
As the results show, there is no strong correlation 
between code-based metrics and DOM-based metrics. 
For instance, the correlation coefficient of statement 
coverage and DOM state coverage (DSC) is 0.19, 
which is considered as no correlation. The same is true 
for all the other data points in the table except for IEC. 
IEC shows a moderate correlation with function (0.43), 
statement (0.47), and branch (0.43) coverage. 
Controlled Experiment 

To address RQ5, we conducted a controlled experi- 
ment. In this experiment, we compared how effective 
DomCovery [10] is when compared to current web 
application code coverage and development tools. 
Coverage Improvements 

The purpose of this first evaluation is to address RQ6: 
which techniques are effective in guiding dynamic 
symbolic execution to achieve high code coverage. It 
is obvious from the recorded data that RandomBranch- 
Search is much less effective than all the others in 
achieving coverage.In every given number of runs, it 
achieved the lowest coverage result. This emphasizes 
the need of using coverage guidance to better perform 
dynamic symbolic execution for maximum code cov- 
erage. CfgDirectedSearch, in the first two numbers of 
runs, 500 and 1000, obtained higher coverage than our 
approach, CoverageRandomSearch. But afterwards,our 
search strategy maintained the highest coverage results. 
Another important observation from this evaluation 
is that RandomBranchSearch while less effective in 
achieving branch coverage took significantly less time 
tha others.CfgDirectedSearch, on the other hand, used 
the largest amount of time in every single number 
of runs. That is the computation cost for solving 
control dependencies required by this search strategy 
is considerable. 

Cost-Effectiveness 

The purpose of this second evaluation is to address 
RQ7: which techniques are cost-effective to perform 
dynamic symbolic execution for maximum code cov- 
erage. RandomBranchSearch with less computation 
costs performed the largest number of runs on every 
single amount of time allocated. For our approach, it 
had three cases achieving better coverage results than 
RandomBranchSearch, 120s,180s and 240s. But in the 
other cases, the results are slightly lesser. In case of 
CfgDirectedSearch, as foreseen, the large computation 
cost required crucially degrades its effectiveness. This 
search strategy performed the lowest numbers of runs 
and achieved the lowest coverage results in most cases. 
Efficiency and Effectiveness. 
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In this evaluation, we use the number of program 
iterations and solved constraints as measurements on 
the performance of coverage testing. Note one program 
iteration indicates one feasible path candidate is exer- 
cised and one test case is generated. For a DSE-based 
approach, program execution and constraint solving 
count for most testing costs. 


VII. CONCLUSION 


In this paper we have briefly presented a strategy for 
supporting automated test recommendation for software devel- 
opers in their software development environment. This work 
is motivated by the fact that a human-written test suite is a 
valuable source of domain knowledge, which can be used to 
tackle some of the challenges in automated web application 
test generation. Given a web application and its DOM-based 
(such as Selenium) test suite, our tool, called Testilizer, utilizes 
the given test suite to generate effective test cases by exploring 
alternative paths of the application, and regenerating assertions 
for new detected states. Our empirical results on four real- 
world applications show that Testilizer easily outperforms a 
random test generation technique, provides substantial im- 
provements in the fault detection rate compared with the 
original test suite, while slightly increasing code coverage 
too.The search heuristic for coverage improvements and cost- 
effectiveness evaluations results show the effectiveness of 
the proposed approach. DSE-based coverage-driven testing 
framework is proposed. It supports both structural and logical 
coverage criteria through the unified coverage structure. It is 


easy to be implemented on existing DSE-based tools.Since the 
effort involved in writing good tests for software components 
is still very high and demands a lot of human resources, 
we consider research in this area promising and of high 
potential value. To reinforce our theoretical vision of how 
are commendation system may help developers increase their 
productivity in testing and at the same time harvest the 
knowledge stored in tests written by other developers, testers 
and domain experts. 
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Abstract—Robotics is a branch of engineering or technology 
that deals with the design, construction, operations and its 
applications. This paper mainly focuses on HRI (Human Robot 
Interation) and HRI problems, social acceptance and impact 
of robots, cloud robotics and applications, and some other 
applications of robotics. HRI is a field of study dedicated to 
understanding, designing and evaluating robotics system used 
by human. The cloud has potential to enhance a broad range 
of robots and automation systems. Finally this paper is used in 
research and the result in the technologies could end up taking 
over most jobs from human. 

Index Terms—Human Robot Interaction, Cloud Robotics, 
Artificial Intelligence 


I. INTRODUCTION 


Today, it is increasingly common for people to use or come 
into contact with robots in various situations at home and 
in retail stores, hotels and hospitals etc. Robots are mainly 
classified into several types based on their functionality such 
as service and utility robots or those designed to communicate 
with humans and appearance such as humanoid robots or 
mechanical robots. The type of robots, to which each country 
attaches particular importance in the advance of robotics, 
reflects the sense of values and preferences of its population. A 
robot is a mechanical intelligent agent which can perform tasks 
on its own, or with guidance. In practice a robot is usually an 
electro-mechanical machine which is guided by computer and 
electronic programming. Robots can be autonomous or semi- 
autonomous. 

First section says about HRI and HRI problems. HRI 
is a field of study dedicated to understand designing and 
evaluating robotics system used by human. Interaction means 
a communication between the human and robots, it may take 
several forms to communicate. 

Second section points the cloud robotics and automation 
system.Cloud robotic and Automation systems are defined 
as, any robot or automation system that depends on data 
or code from a network to support its operation. The cloud 
also provides economics of scale and facilities sharing of 
data across application and uses. The third section of this 
paper deals with the social acceptance and impact of robots 
and artifitical intelligence. This paper also deals with some 
applications of robotics and research results in the following 
sections. 


and Its Applications 


Reji C Joy 
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II. HUMAN-ROBOTS INTERACTION 


HRI is a highly interdisciplinary area, at the intersection 
of robotics, engineering, computer science, psychology, lin- 
guistics, ethology and other disciplines, investigating social 
behaviour, communication and intelligence in natural and 
artificial systems. HumanRobot Interaction (HRI) is a eld of 
study dedicated to under- standing, designing, and evaluating 
robotic systems for use by or with humans. Interaction, by den- 
ition, requires communication between robots and humans[1]. 
Communication between a human and a robot may take several 
forms, but these forms are largely inuenced by whether the 
human and the robot are in close proximity to each other or 
not. Thus, communication and, therefore, interaction can be 
separated into two general categories: 


1) Remote interaction: The human and the robot are not 
co- located and are separated spatially or even temporally 
(for example, the Mars Rovers are separated from earth 
both in space and time). 

2) Proximate interaction: The humans and the robots are 
co-located (for example, service robots may be in the 
same room as humans). 


The HRI problem is to understand and shape the interac- 
tions between one or more humans and one or more robots. 
Interactions between humans and robots are one of the main 
factor in all of robotics, even for so called autonomous robots, 
they are still used by and are doing work for humans and 
may work by human commands. As a result, the essential 
components of HRI are evaluating the capabilities of humans 
and robots, and designing the technologies and training that 
produce desirable interactions. Such works requiring contri- 
butions from cognitive science, linguistics, and psychology; 
from engineering, mathematics, and computer science; and 
from human factors engineering and design. Designing au- 
tonomy consists of mapping inputs from the environment into 
actuator movements, representational schemas, or speech acts. 
Mainly affect five attributes that affect the interactions between 
humans and robots: 


1) Level and behavior of autonomy 
2) Nature of information exchange 
3) Structure of the team 
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4) Adaptation, learning, and training of people and the robot, 
and 
5) Shape of the task. 


A. Autonomy 


Designing autonomy consists of mapping inputs from the 
environment into the actuator movements, representational 
schemas, or speech acts. One operational characterization of 
autonomy that applies to mobile robots is the amount of time 
that a robot can be neglected, or the neglect forbearance of 
the robot. Without interaction ,a system with a high level of 
autonomy is one that can be neglected for a long period of 
time . 

Autonomy is not an end in itself in the eld of HRI, but 
rather a means to supporting the productive interaction among 
human and robots . Indeed, autonomy is only useful insofar 
as it supports beneficial interaction between a human and a 
robot. Consequently, the physical manifestation and type of 
autonomy varies dramatically across robot platforms 


B. Information Exchange 


Autonomy is only one of the components required to make 
an interaction worthwhile . A second component is the manner 
in which information is exchanged between the human and the 
robot. Measures of the efficiency of an interaction include the 
interaction time required for intent and/or instructions to be 
communicated to the robot, the cognitive or mental workload 
of an interaction , the amount of situation awareness produced 
by the interaction , and the amount of shared understanding 
between humans and robots. The two primary dimensions that 
determine the way information is exchanged between a human 
and a robot are , the communications medium and the format 
of the communications. The primary media are delineated by 
three of the ve senses : seeing, hearing, and touch. These media 
are manifested in HRI as follows: 


1) Visual displays, typically presented as graphical user 
inter- faces or augmented reality interfaces 

2) Gestures, including hand and facial movements and by 
movement-based signaling of intent 

3) Speech and natural language, which include both auditory 
speech and text-based responses, and which frequently 
emphasize dialog and mixed-initiative interaction. 

4) Non-speech audio, frequently used in alerting 

5) Physical interaction and haptics, frequently used remotely 
in augmented reality or in teleoperation to invoke a sense 
of presence especially in telemanipulation tasks, and also 
frequently used proximately to promote emotional, social, 
and assistive exchanges 


C. Teams 


HRI problems are not restricted to a single human and a 
single robot but want to interact with more than one human 
or robots, though this is certainly one important type of 
interaction. Such as team of members needed to interact with 
a robot or more than one remote robots. In addition to the 
number of humans and robots in a team, a key problem is the 


organization of the team. In addition to the number of humans 
and robots in a team, a key problem is the organization of the 
team . One important organizational question is who has the 
authority to make certain decisions: robot, interface software, 
or human? Another important question is who has the authority 
to issue instructions or commands to the robot and at what 
level: strategic, tactical, or operational? A third important 
question is how conicts are resolved, especially when robots 
are placed in peer-like relationships with multiple humans. A 
fourth question is how roles are dened and supported: is the 
robot a peer, an assistant, or a slave; does it report to another 
robot, to a human , or is it fully independent. Spanning all of 
these questions is whether the organizational structure is static 
or dynamic, with changes in responsibilities, authorities, and 
roles. 


D. Adaptation, Learning, and Training 


Although robot adaptation and learning have been addressed 
by many researchers, training of humans appears to have 
received comparatively little attention in the HRI, even though 
this area is very important. One reason for this apparent trend 
is that an often unstated goal of HRI is to produce systems 
that do not require significant training. This may be because 
many robot systems are designed to be used in very specific 
domains for brief periods of times. Moreover, robot learning 
and adaptation are often treated as useful in behavior design 
and in task-specific learning , though adaptation is certainly 
a key element of long-term interactions between humans and 
robots .On one hand, it is important to minimize the amount of 
human training and adaptation required to interact with robots 
that are used in therapeutic or educational roles for children, 
autistic individuals, or mentally challenged individuals. On the 
other hand, it is important that HRI include proper training 
for problems that include, for example, handling hazardous 
materials; similarly the very nature of using robots in therapeu- 
tic and educational roles requires that humans should directly 
adapt and learn from the interaction 

Also discuss about how the concept of training can be used 
to help robots evolve new skills in new application domains: 


1) Minimizing Operator Training 
Minimizing training appears to be an implicit goal for 
edutainment robots, which include robots designed for 
use in classrooms and museums, for personal entertain- 
ment, and for home use. These robots are typically de- 
signed to be manageable by a wide variety of humans, and 
training can range from instruction manuals, instruction 
from a researcher, or instructions from the robot itself. 

2) Efforts to Train Humans 
In contrast to the goal of minimizing training in edutain- 
ment robots, some application domains involving remote 
robots require careful training because operator workload 
or risk is so high. Important examples of such training 
are found in military and police applications, space ap- 
plications, and in search and rescue applications. 

3) Training Designers 
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Importantly, an often overlooked area is the training of 
HRI researchers and designers in the procedures and 
practices of those whom they seek to help. 
4) Training Robots 

It is tempting to restrict training to the education of the 
human side of HRI, but this would be a mistake given 
current HRI research. In HRI, robots are also learning, 
both online as part of the design process and online as 
part of interaction, especially long-term interaction. 


E. Task-Shaping 


Robotic technology is introduced to a domain either to allow 
a human to do a task that they could not do before, or to make 
the task easier or more pleasant for the human. Implicit in this 
assertion is the fact that introducing technology fundamentally 
changes the way that humans do the task. Task-shaping is a 
term that emphasizes the importance of considering how the 
task should be done and will be done when new technology 
is introduced[1]. Compared to the other ways that a designer 
can shape 


III. CLOUD ROBOTICS 


Cloud Robotics enables improving the functionality of sim- 
ple robots by relying on the clouds computing as an extra 
source of memory and processing. Hence, employing the phys- 
ical features of the usual robots in addition to the great virtual 
innovations of cloud computing results in smarter robots, 
with great tolerance to possible intelligent adjustments. Cloud 
robotics can be used as the centralized processing and storage 
technology behind the usual robots. The customers would have 
their robots connected to the cloud as the center of their data, 
or code to strengthen their operation with intelligent sensor 
analysis insight on the accumulated data[2].Cloud robotics 
facilitates reliable and agile connection and sharing of robot 
resources instead of relying on standalone robots. Therefore, 
robots will follow their tasks while cooperating through clouds 
to expand their processing features. 

Cloud robotics as in offloading the intensive computations 
from robots to clouds provides such infrastructure virtually 
on clouds. The robotic industries can use whatever process- 
ing and storage they need through cloud robotics without 
worrying about changing or upgrading their local servers 
and sub-systems. Robots can communicate and share their 
data collectively on clouds through their wireless connections 
to the cloud. Moreover, they can implement their real-time 
applications relying on the parallel computational resources on 
the clouds. Moreover, sharing the data, access and updating of 
the remote libraries of images, maps, and object data would 
be possible. Hence, cloud robotics can overcome the current 
issues in the robotics industry and pave the way for more 
penetration of robots in our daily lives while increasing the 
quality of it. 

The AI and Computer Vison are undeniable elements of 
robotics application these days. There has been lots of interests 
in AI applications such as Simultaneously Localization and 


Mapping (SLAM), Grasping ,Navigation etc . These applica- 
tions require more agile and complex processors than usual 
robots to implement their real-time and heavy computational 
functionalities. As such work-loads exceeds the capacity of 
usual robots, supporting the AI functions demand for equip- 
ping and upgrading the current robots with high-priced proces- 
sors and their prerequisite essentials. Moreover, by offloading 
the heavy computation from the robot to a server, the robot 
power consumption decreases by far, and the computation of 
the robot can be used for its other capabilities. 


A. Big Data 


“Big data and Cloud computing have become the next 
big step for robot intelligence[3].“Big Data describes “data 
that exceeds the processing capacity of conventional database 
systems including images, video, maps, real-time net- work 
and financial transactions , and vast networks of sensors . 


B. Cloud Computing 


Cloud Computing is a promising technology which is 
intended to harness the power of networked computers and 
communication system in a more cost effective way. There is 
a wide gap in Cloud Computing technology and robotics tech- 
nology. Generally, the robots cannot use the Cloud Computing 
instance directly. 

Moreover, there are many Cloud Computing platforms are 
present around, deciding which one to use when is still an 
issue. In this paper we take a look at some prominent open 
source cloud solutions and evaluate and compare the stability, 
performance and features of these clouds keeping the key 
requirements of cloud robotics under consideration. 


C. Collective Robot Learning 


It is possible for multiple robots to share their experience 
with one another, and thereby, learn a policy collectively. In 
this work, we explore distributed and asynchronous policy 
learning as a means to achieve generalization and improved 
training times on challenging, real-world manipulation tasks. 
We show that it achieves better generalization, utilization, and 
training times than the single robot alternative. 

A system architecture for cloud robotics, and then focus on 
the two key enabling sub-systems: the M2M communication 
framework and the elastic computing architecture. Our cloud 
robotics differentiates from existing solutions in that it lever- 
ages two complementary clouds (1.e., a virtual ad-hoc cloud 
and an infrastructure cloud). 

On the M2M level, a group of robots communicate via 
wireless links to form a collaborative computing unit. The 
benefits of forming a collaborative computing unit are multi- 
fold. First, the computing capability from individual robots 
can be pooled together to form a virtual ad-hoc cloud 
infrastructure[4]. Second, within the collaborative computing 
unit, information can be exchanged for collabo-rative decision 
making in various robot-related applications. Finally, it allows 
robots not within communication range of a cloud access 
point to access information stored in the cloud infrastructure 
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<——>_ Machine-to-Machine (M2M) communication between 
neighboring robots 


Fig. 1. Machine-tomachine communication 


or send computational requests to the cloud. On the M2C 
level, the centralized cloud infrastructure provides a pool of 
shared computation and storage resources that can be allocated 
elastically for real-time demand.First, it can unify a large 
volume of information about the environment, which can be 
organized in a format usable by robots. Second, it can provide 
an extensive library of skills or behaviors that are related 
to task requirements and situational complexities, making it 
feasible to learn from the history of all cloud-enabled robots. 

Elastic Cloud Computing Architecture. 

We focus on the following three elastic computing models: 


1) Peer-Based Model: each robot and each virtual machine 
(VM) in the cloud is considered as a computing unit. 
These robots and VMs form a fully distributed computing 
mesh. A task can be divided into smaller modules for 
execution over a subset of the nodes in the computing 
mesh. 

2) Proxy-Based Model: in the group of networked robots, 
one unit functions as a group leader, communicating with 
a proxy VM in the cloud infrastructure, to bridge the 
interaction between the robotic network and the cloud. 
The set of computing units are organized into a two-tier 
hierarchy. 

3) Clone-Based Model: each robot corresponds to a system- 
level clone in the cloud. A task can be executed in the 
robot or in its clone. The set of robotic clones also form 
a peer-to-peer network with better connectivity than the 
physical ad-hoc M2M network. Moreover, this model 
allows for sporadic outage in the physical M2M network. 


IV. SOCIAL ACCEPTANCE OF ROBOTS AND ARTIFICIAL 
INTELLIGENCE 


A. Current Status of Social Acceptance of Robots and Artifi- 
cial Intelligence 


In November 2015, NRI conducted an online consumer 
survey in Japan, the U.S. and Germany on the topic of robots 


and artificial intelligence. The survey results revealed the 
differences among consumers in the three countries in terms 
of their knowledge, acceptance and usage intention of these 
technologies. The country-by-country differences in robotics 
development trends are considered to be largely influenced by 
the sense of values and preferences of the population of each 
country.U.S. has the highest level of robot utilization at home 
and in retail stores with its people being the most enthusiastic 
about the future use of robots. In Germany shows a strong 
tendency to consider robots for industrial purposes, and its 
people feel strong resistance to the presence of robots in their 
households. Japans rapidly aging population; there are a higher 
number of people who expect to utilize nursing care robots. As 
such, Japan is likely to see its market for nursing care robots 
ahead of the rest of the world. 

American consumers come in direct contact with 
robots has been increasing in the business-to-business- 
to-consumer(B2B2C) fields such as those installed by 
retailers in stores, and is expected to grow further. Robots of 
this and similar types includes (1) ones that help consumers 
find items in stores, (2) room-service robots in hotels and (3) 
medical robots that deliver telemedicine service to patients. 
These types of robots are designed to assist consumers in 
receiving more convenient services. Both in Japan and in the 
U.S., the majority of robots being sold in the market are ones 
that perform specific functions such as Roomba, the floor 
cleaning robot[5]. Robots that communicate with humans 
are called social robots, due to increased media coverage; 
social robots have quickly become well recognized among 
consumers. Social robots can be classified into several types 
based on their functionality and appearance. By using this 
classification framework, there are three main groups of 
robots. 

The first group includes utility robots with a mechanical 
appearance such as Roomba named robots and Amazons 
voice-activated smart speaker robot, Echo. The second group 
includes robots like Jibo and Buddy that are not humanoid 
robots but are categorized as social robots because of their 
ability to communicate with humans. The third group consists 
of robots with human-like appearance such as Pepper named 
robots that also communicate with humans. 


B. Differences in Attitudes towards and Acceptance of Robots 
in Japan, the U.S. and Germany 


In the NRI survey, a robot is defined as a machine that can 
autonomously assist humans without relying on continuous 
instructions or programming. 

In all three countries, about 60 - 70 percent of respondents 
said that Very comfortable or Somewhat comfortable regarding 
robots being part of their daily lives, revealing that the 
percentages of respondents feeling resistance (uncomfortable) 
are low. 

The survey asked whether social robots need a human-like 
face and changing facial expressions. About 50 percent of 
respondents in each of these three countries answered either 
Definitely needed or Somewhat needed. 
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NRI conducted a survey of Japanese, American and German 
consumers on the appearance of robots to identify the shape 
and surface that consumers find favorable. What appearance 
makes a robot more likeable? Does the shape of a social robot 
need to be human-like? To find answers to these following 
questions, the survey provided four options. They are:(1)a 
robot whose shape and surface are like a real human(2) a robot 
whose shape is human-like but whose surface is different from 
a human (Pepper, etc.), (3) a robot whose shape and surface are 
different from a human (Jibo, etc.) and (4) a robot whose ap- 
pearance resembles an animal, such as a dog or cat. The survey 
revealed that in all three countries, the larger proportions of 
respondents found that robots, whose shape is human-like but 
whose surface is different from a human are very or somewhat 
favorable, and found uncomfortable for robots similar to real 
humans. The U.S. scored the largest proportion of respondents, 
who found very or somewhat favorable for (3) a robot whose 
shape and surface are different from a human, exposing the 
tendency that consumers do not care about whether the shape 
of a robot is human-like. Large proportions of Japanese and 
American respondents were very or somewhat favorable for 
animal-shaped robots; the proportion was low in Germany. 
These findings suggest that a large proportion of German 
respondents consider that robots should appear more machine- 
like. 


C. Needs and Potential of Nursing care Robots 


While nursing care is similar to medical treatment purposes 
in that both services are directly related to a persons health, the 
percentage of respondents who want to use robots in nursing 
care facilities in ways such as communicating with robots and 
being assisted by robots was about 10 points higher than using 
robots for medical services in all three countries. In all three 
countries, respondents showed relatively low intention to use 
the robots in the fields of education and medical services. 
The survey asked the respondents whether they knew that A 
research paper on nursing facilities showed that robots in the 
shapes of animals had healing effects on residents.There was a 
large difference between Japanese and American respondents 
in terms of the degree of recognition. The proportions of both 
countries Japanese and American respondents who replied Fa- 
vorableexceeded60percent. Because the acceptability of robots 
in the nursing care field is high in Japan where the population 
is aging faster than in any other country 


D. Senses of Values Related to Science and Technology in 
Japan, the U.S. and Germany, which Influence the Introduction 
of Robots and AI 


Regarding self-driving cars, the proportions of respondents 
who said Very much want to use or Somewhat want to 
use were high at about 60 percent both in Japan and the 
U.S countries. The percentage of respondents who replied 
Completely acceptable or Somewhat acceptable for Artificial 
Intelligence-based phone operator systems was highest in 
Japan. In contrast, Germany scored low percentages in terms 


of both the intention to use a self-driving car and acceptance 
of phone operator systems using Artificial Intelligence. 

The survey asked the respondents whether they knew the 
concern that If computer technologies keep growing at their 
current speed, it is considered that the world will reach a 
technical singularity by 2045, where Artificial Intelligence 
will gain greater thinking ability than that of all humans. The 
proportions of countries American and German respondents 
who were aware of this issue before the day of the survey were 
56 percent and 55 percent, respectively, whereas the proportion 
among their Japanese counterparts was somewhat lowat 45 
percent. 


V. CLOUD ROBOTICS APPLICATIONS 
A. SLAM 


In robotic mapping and navigation, simultaneous localiza- 
tion and mapping (SLAM) is the computational problem of 
constructing or updating a map of an unknown environment 
while simultaneously keeping track of an agent’s location 
within it. SLAM algorithms are tailored to the available 
resources, hence not aimed at perfection, but at operational 
compliance[4]. Published approaches are employed in self- 
driving cars, unmanned aerial vehicles, autonomous underwa- 
ter vehicles, planetary rovers, newly emerging domestic robots 
and even inside the human body. 


B. Grasping 


Grasping in Robotics contains original contributions in the 
field of grasping in robotics with a broad multidisciplinary 
approach. This gives the possibility of addressing all the 
major issues related to robotized grasping, including mile- 
stones in grasping through the centuries, mechanical design 
issues, control issues, modelling achievements and issues, 
formulations and software for simulation purposes, sensors 
and vision integration, applications in industrial field and 
non-conventional applications (including service robotics and 
agriculture). 


C. Navigation 


Autonomous vehicles require advances sensing technolo- 
gies, in order to be able to safely share the environment 
with human operators. Those sensing technologies are in fact 
necessary for identifying the presence of unforeseen objects, 
and measuring their position and velocity. Furthermore, clas- 
sification is necessary for effectively predicting their behavior. 
In this paper we consider the presence of sensing systems 
both on-board each vehicle, and installed on infrastructural 
elements. While the simultaneous presence of multiple sources 
of information heavily improves the amount (and quality) 
of available data, it generates the need for effective data 
fusion and storage systems. Hence, we introduce a centralized 
cloud service, that is in charge of receiving and merging data 
acquired by different sensing systems. Those data are then 
distributed to the autonomous vehicles, that exploit them for 
implementing advanced navigation strategies. 
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VI. SOME OTHER APPLICATIONS 
A. Automation of agricultural production under greenhouse 


Control systems for pests, diseases and weeds currently 
use as an element to identify the medium, a computer vision 
system that implements image processing, in these cases, the 
mobile robotic platform is part of the structure that allows the 
movement through the environment; however, when the system 
does not have movement in the area, computer vision allows 
the movement of the device, usually an arm to different points 
of interest, to perform an agricultural task[6]. This shows that 
recent advances in agricultural robotics go hand in hand with 
advances in computer vision especially in solutions of this 
type. 

Fertilization is an application that is implemented through 
robotic platforms. On this task there are two possibilities, 
first that the displacement of the platforms is guided, that 
is, in rails supported on the greenhouse, where the sensorics 
focuses on solving the problem of the location of the point 
for the application of fertilizers, the second alternative is that 
it is independent of the structure of the greenhouse, where 
the sensor tries to solve the problem of identification of 
the medium, detection of the path and simultaneously, the 
detection of the plant for application of inputs or harvest 
on the target. Optimizing greenhouse production implies the 
implementation of autonomous systems that make use of the 
concept of precision agriculture, that is, to apply the quantities 
required by the plant, in the appropriate times, in the indicated 
place, to avoid excess use of inputs and reduce environmental 
impact. 


B. Automation system for transportation of goods in hospitals 


A system utilizing semi-autonomous robots for automation 
of transportation of goods in hospitals is presented. System 
basically consist of a fleet of robot vehicles, electronically 
traceable containers, stations with the capability of automated 
loading and unloading of containers, and a remote supervisory 
system which manages the transportation activity. Robot vehi- 
cles are capable of navigating autonomously, with the help 
of artificial landmarks placed on their paths[7]. Containers 
provide a safe enclosure to the goods to be transported and in- 
crease the assets tracking capabilities. Stations are eliminating 
the need of human presence for loading and unloading robot 
vehicles. 


VII. CONCLUSION 


Humanrobot interaction is a growing eld of research and 
application. The eld includes many challenging problems and 
has the potential to produce solutions with positive social 
impact. Its interdisciplinary nature requires that researchers 
in the eld understand their research within a broader context. 

Cloud Robotics and Automation also introduces the poten- 
tial of robots and systems to be attacked remotely: a hacker 


could take over a robot and use it to disrupt functionality or 
cause damage. We have proposed cloud robotics architecture 
to address the constraints faced by current networked robots. 
Cloud robotics allows robots to share computation resources, 


information and data with each other, and to access new 
knowledge and skills not learned by themselves. This opens 
a new paradigm in robotics that we believe leads to exciting 
future developments. It allows the deployment of inexpensive 
robots with low computation power and memory requirements 
by leveraging on the communications network and the elastic 
computing resources offered by the cloud infrastructure. Ap- 
plications that can benefit from the cloud robotics approach 
are myriad and include SLAM, grasping, navigation. 

In the year ahead, along with the introduction of robots 
and AI technologies ,a portion of the skills held by the 
work force will become obsolete. Such a situation will likely 
result in the shrinkage of the middle class and increase the 
unemployment rate of unskilled workers. The utilization of 
robots and AI has great potential to change our ways of living 
and eventually, our society as a whole. The findings of Nomura 
Research Institute, Ltd. (NRI) conducted survey revealed that 
perception of Japanese, American and German consumers 
regarding robots and AI technology is generally positive, and 
that they are trying to integrate these technologies into their 
daily lives, On the other hand, the survey also disclosed their 
negative attitudes in terms of factors such as the appearance 
and tasks performed by robots. Many benefits of robots seem 
to be most noticeable in productivity, safety, and in saving 
time and money. 
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Abstract—In todays world where everything is recorded digi- 
tally, right from our web surfing patterns to our medical records, 
we are generating and processing petabytes of data every day. 
Big data will be transformative in every sphere of life. But just to 
process and analyse those data is not enough, human brain tends 
to find pattern more efficiently when data is represented visually. 
Data Visualization and Analytics plays important role in decision 
making in various sectors. It also leads to new opportunities 
in the visualization domain representing the innovative ideation 
for solving the big-data problem via visual means. It is quite a 
challenge to visualize such a mammoth amount of data in real 
time or in static form. In this paper we discuss why big data 
visualization is important, review some big data visualization 
frameworks, tools and applications. 

Index Terms—Big data, visualization, Methods, frameworks, 
challenges, visualization tools, applications. 


I. INTRODUCTION 


We are living in a data driven era, in which data are 
continuously needed for a variety of purposes. The ability 
to make timely decisions based on available data is crucial 
to business success, cyber and national security, and disaster 
management. Evolution of Data Structures and exponential 
growth of data, produced by people and businesses drove the 
evolution of technologies for data analysis. In our lifespan we 
went from megabyte-sized datasets to petabytes-sized datasets, 
which is called as Big Data. Big data is an evolving term 
that describes any voluminous amount of structured, semi- 
structured and unstructured data that has the potential to be 
mined for information. 

Big data has evolved significantly over the past few years. 
One of the main drivers of this evolution is storage, which 
has become much more affordable. Big Data cannot be 
comprehended by any user without proper Data Visualization 
method. Big data visualization refers to the implementation of 
more contemporary visualization techniques to illustrate the 
relationships within data. Visualization tactics include applica- 
tions that can display real-time changes and more illustrative 
graphics, thus going beyond pie, bar and other charts. The 
primary goal of big data visualization is to communicate 
information clearly and efficiently via statistical graphics, plots 
and information graphics. These illustrations veer away from 
the use of hundreds of rows, columns and attributes toward 
a more artistic visual representation of the data. Effective 
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visualization helps users to analyse and reason about data and 
evidence. Data visualization is both an art and a science. It is 
viewed as a branch of descriptive statistics by some, but also 
as a grounded theory development tool by others. 


There are suitable frameworks are used for proper visual- 
ization of big data. One of the framework providing tool is 
D3.js [1] which can reduce the cost of building warehouse 
and another one is DeepEye[2] which automate the process of 
transforming given datasets into images. Big data visualization 
is also used in passenger flow analysis[3] and it is an example 
of the high visualization framework. The big data environment 
supports to resolve the issues of cyber security in terms of 
finding the attacker. There are security challenges of big data 
as well as security issues the analyst must understand during 
the process of visualization. The security challenges includes 
fraud detection, network forensics, data privacy issues, and 
data provenance problems. Augmented reality[4] is one of the 
popular method which solve a lot of issues from narrow vision 
angle, navigation, scaling, etc. 

Various tools have emerged to help us out from the problems 
in visualization. Interactiveness is the most important feature 
that a visualization tool must have. Some of the most popular 
visualization tool are Tableau[5], Irregular Trend Finder[6], 
Parallel coordinates[7], SpaceViz[8], etc. Nanocubes and vi- 
sualization based data discovery tools[9] are other means of 
big data visualization. 


II. METHODS AND CONCEPTS 


We took the idea about this topic from IEEE Explorer. 
The survey has been administered for the years 2008 to 
2018. From the survey, we identified that the method for big 
data visualization are mainly classified in to frameworks and 
tools. These tools are quite promising, they generate rich and 
interactive visualization, and most of them tackle the huge 
volume of data and response in acceptable amount of time. 
Before choosing any of the visualization tool or framework, 
business want to review what all are their requirement and 
which tool suite the best for them. Figure 1 shows the various 
frameworks and tools used for big data visualization. 


Aswathy C Viswam et al., “A Study on Big Data Visualisation” 


26 


Proceedings of the Vidya Computer Applications Departmental Seminar (VCADS - 2018), 4 - 5 April 2018 
Department of Computer Applications, Vidya Academy of Science & Technology, Thrissur — 680501 


Methods 
Frameworks Tools 
Parallel Coordinate 
DeepEye D3.JS ITF 

Tableau 

Microsoft Power BI 
Plotly 
Gephi 

SpaceViz 

Excel 2016 

Fig. 1. Classification of methods for Big data visualization 


A. Frameworks 


DeepEye and D3.JS are the most popular framework used 
for big data visualization. 

1) DeepEye: DeepEye automates the process of transform- 
ing given datasets into images. Given a (big) dataset, the 
essential task of visualization is to visualize the data to tell 
compelling stories by selecting, filtering, and transforming the 
data, and picking the right visualization type such as bar charts 
or line charts[2]. Main goal of this framework is to automate 
this process. It follow three main challenges: 


i) Visualization verification: To determine whether a vi- 
sualization for a given dataset is interesting, from the 
viewpoint of human understanding. 

ii) Visualization search space: A boring” dataset may 
become interesting after an arbitrary combination of 
operations such as selections, joins, and aggregations, 
among others. 

iii) On-time responses: Do not deplete the users patience. 


DeepEye aims to address the above mentioned challenges by 
targeting a wide spectrum of important applications to enable 
on-time automatic data visualization. 


e Data understanding: The first step of running any data 
analysis is to understand the data in hand. This is typi- 
cally a manual task, which is conducted by using visual 
analytics tools. 

e Query understanding: Writing SQL queries for business 
applications, especially complicated queries, is seldom a 
one-shot job. At times, one wants to know what the actual 
results, not the query, look like. Explaining the query 
result by visualization is clearly preferable to the users. 

e Dynamic data monitoring: In the realworld, data is al- 
ways dynamic and not static. Data visualization can help 


identify and act on the emerging trends faster. 


2) D3.JS: D3.js is a web visual presentation tool that 
provide a framework to achieve better visualization for Big 
Data[1]. This framework can reduce the cost of building 
Big Data warehouses by divide data into sub dataset and 
visualize them respectively. In this framework, current data can 
be presented to users from different dimensions by different 
rich statistics graphics. Based on this idea ”First give a right 
size, properly screened summary, and then display the details 
needed”, interactive visualization graphics will be shown from 
a different dimension analysis. 

Each data source can be a small data warehouse in the 
framework. Each data source can be visualized and statis- 
tics according to demands. Then each data source calculates 
statistical data according to the demands and provide access 
interface. At the last, data will be returned and summarized and 
displayed in the browser and stored in the data warehouse. This 
way can reduce the cost of setting up a large data warehouse. 
Through the interface, the original data values can also be 
hidden. So between the data and the user is translucent, not 
like d3 that is transparent. Based on this idea, a data source 
for data calculation and visualization can be achieved. D3.js 
is a library of JavaScript. When using D3.js to visualize, we 
can directly upgrade to a newer browser version instead of 
installing plug-ins in the first place. 


B. Tools for visualization 


Traditional visualization tools have reached to their limits 
when encountered with very large data sets and these data are 
evolving continuously. The visualization tool should be able 
to provide us interactive visualization with as low latency as 
possible. Big Data visualization tool must be able to deal with 
semi-structured and unstructured data because big data usually 
have this type of format. Most of the current visualization tool 
have low performance in scalability, functionality and response 
time. Methods have been proposed which not only visualizes 
data but processes at the same time. 

Various tools have emerged to help us out from the big 
data problems. The most important feature that a visualization 
must have is that it should be interactive, which means that 
user should be able to interact with the visualization. Some of 
the most popular visualization tools are given below. 


i) Parallel Coordinate: Parallel coordinate is a popular 
tool for visualizing high-dimensional data and analysing 
multivariate data[7]. With the rapid growth of data 
size and complexity, data clutter in parallel coordinates 
is a major issue for Big Data visualization. This has 
given rise to three problems; 1) how to rearrange the 
parallel axes without the loss of data patterns, 2) how 
to shrink data attributes on each axis without the loss 
of data trends, 3) how to visualize the structured and 
unstructured data patterns for Big Data analysis. 
Parallel coordinates are used to classify Big Data which 
can be applied across multiple datasets, different data 
types and topics. Firstly, we analysed the Big Data 
attributes and introduced the 5Ws subsets in parallel 
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ii) 


iii) 


iv) 


v) 


axes. Secondly, we established the 5Ws sending density 
and receiving density as additional parallel axes to 
measure Big Data flow patterns across multiple datasets 
for different data-types and topics. The 5Ws dimensions 
stand for; When did the data occur, Where did the data 
come from, what was the data content, How was the data 
transferred, Why did the data occur, and Who received 
the data. The 5Ws density parallel coordinates with re- 
ordering and clustered views provide visual structures 
and patterns to illustrate the close relationship between 
the axes in a graphic layout. It clearly demonstrates Big 
Data patterns for different datasets, different topics and 
different datatypes in visualization. 

Irregular Trend Finder (ITF): Irregular Trend Finder 
is an interactive tool designed for analysing large 
amounts of data with timestamp and hierarchy structure 
in mind, so that a user can see the overview first, 
and then obtain more detailed information [6]. The 
main goal of ITF was set to obtain overview and see 
detailed information about the anomalies with speed in 
mind. ITF find anomalies in such a way that Overview 
first, zoom and filter, then details on demand. ITF was 
developed by Processing with JSON library. It has a 
hierarchical structure and it consists of 3 views, from top 
to bottom: All- View (AView), Region-View (RView), 
and Branch-View (BView). The user can see more 
detailed information with deeper View. Once the user 
takes a look at a certain View, (s) he is able to see 
another view by switching the tab on the upper area of 
the window. Also the user can clear the view by clicking 
the current tab (except AView). 

Tableau: Tableau is interactive data visualization tool 
which is focused on Business Intelligence. Tableau 
provides very wide range of visualization options [5]. 
It provides option to create custom visualization. It 
is fast and flexible. It supports mostly all the data 
format and connection to various servers. User interface 
is intuitive, wide variety of charts are available. For 
simple calculations and statistics one does not require 
any coding skills but for heavy analytics we can run 
models in R and then import the results into Tableau. 
This requires quite a bit of programming skill based 
upon the task we need to perform. 

Microsoft Power BI: Power BI is a powerful cloud- 
base business analytics service [5]. Visualization are 
interactive and rich. Power BI consists of 3 elements, 
Power BI Desktop, Service (SaaS), Apps. Every service 
is available to us that is why it makes Power BI 
flexible and persuasive. With more than 60 types of 
source integration you can start creating visualization 
in matter of minutes. Power BI combines the familiar 
Microsoft tools like Office, SharePoint and SQL Server. 
The feature that it distinguishes from other tools is that 
you can use natural language to query the data. You 
dont require programming skills for this tool. 

Plotly: Plotly is also known as Plot.ly is build using 


vi) 


vii) 


viii) 


python and Django framework. The actions it can 
perform are analysing and visualizing data. It is free 
for users but with limited features, for all the features 
we need to buy the professional membership. It creates 
charts and dashboards online but can be used as offline 
service inside python notebook, Jupiter notebook and 
panda. Different variety of charts are available like 
statistical chart, scientific charts, 3D charts, multiple 
axes, dashboards etc. Plotly uses a tool called Web Plot 
Digitizer (WPD) which automatically grabs the data 
from the static image[10]. 

Gephi: Gephi is open-source network analysis tool writ- 
ten in Java and OpenGL. It is used to handle very large 
and complex datasets. The network analysis includes: 


e Social Network Analysis 
e Link Analysis 
e Biological Network Analysis 


With its dynamic data exploration Gephi stands out rest 
of its competition for graph analysis. No programming 
skills are required to run this tools but a good knowledge 
in graphs is necessary. It uses GPU 3D render engine to 
accelerate the performance and give real time analysis 
[11]. 

Excel 2016: Microsoft Excel is a spreadsheet developed 
by Microsoft[5]. It can not only be used for Big Data 
and statistical analysis but it is also a powerful visu- 
alization tool. Using power query excel can connect to 
most of the services like HDFS, SaaS etc and is capable 
of managing Semi- Structured data. Combined with vi- 
sualization techniques like Conditional Formatting and 
interactive graphs makes Excel 2016 a good contender 
in the ocean of Big Data visualization tools. 
Nanocubes and visualization based data discovery: 
Visualization is one tool which is shown to be effective 
for gleaning insight in big data. Data summarization 
alone will never solve the problem of scale in ex- 
ploratory visualization[9]. Data cubes are structures 
that perform aggregations across every possible set of 
dimensions of a table in a database, to support quick ex- 
ploration. This technique enables real time exploratory 
visualization on datasets that are large, spatiotemporal, 
and multidimensional. Because the speed of our data 
cube structure hinges partly on it being small enough 
to fit in main memory, which is called as nanocube. 
Nanocubes are unique in the sense that visualizing data 
doesn’t require heaps of local resources. Nanocubes still 
take more memory than we would like. We envision 
dynamic control over the cardinality of dimensions, but 
leave that for future work. We would also like to explore 
hybrid solutions that utilize both on-disk and in memory 
data structure to enable more complex nanocubes. 
Visualization-based data discovery tools are another 
technique for big data visualization which allow busi- 
ness users to mash up disparate data sources to create 
custom analytical views with flexibility and ease of use 
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that simply didnt exist before. End users can view the 
graphics on the same gadgets, or on even smaller mobile 
devices such as tablets or, in limited cases, smartphones. 

ix) SpaceViz: SpaceViz is an information visualization tool 
designed for depicting hierarchies, such as directory 
structures[8]. SpaceViz has scanned, map and manage 
PC, network and cloud storage of multimedia data. 
SpaceViz is a tool for displaying computer stored mul- 
tiple formats data hierarchies that rely on the space- 
utilization methods. SpaceViz is similar to Tree-map 
in that it constructs classical, qualified displays using 
rectangular visual technique, but differs from Treemap 
in that it employees multiple views at the same time 
with different brushing methods such as color code 
the file formats and with cushion the nodes for clear 
visualization. 


II. APPLICATIONS 


Data visualization is one of the most important applications 
in Big Data Era. It can greatly improve the ability of data min- 
ing. The application of big data visualization can characterize 
and explain the data more effectively and intuitively. 


A. Shanghai Metro Network 


The passenger flow data of urban rail transit (URT) network 
has the characteristics of large scale, fast-update, multi-mode, 
difficult to identify and great value, the same as Big Data[12]. 
It is meaningful and effective to use big data visualization in 
passenger flow analysis. With high visualization frameworks, 
the massive data of passenger flow in Shanghai Metro network 
is highly graphical in time space, which is processing from 
four aspects: the network, line, station and section. It is 
efficient in mining the passenger flow data further and showing 
more information and laws. The research results provide new 
means for passenger flow analysis and operation aid decision 
making (ADM) of urban rail transit operation and management 
department. 


B. Smart Cyber University 


Replacement of statistical studies on a comprehensive anal- 
ysis of Big Data in education lets go to the concept of cyber 
physical system Smart Cyber University, which is character- 
ized by: the presence of regulatory rules digitized space, accu- 
rate monitoring and active management of cyber addressable 
components of scientific and educational processes, automatic 
generation of regulatory operational impacts, independent of 
managers taking cyber solutions to manage financial and 
human resources, excluding paper carriers from the scientific 
and educational processes[13]. Using Smart Cyber University 
System creates an unprecedented opportunity for the develop- 
ment of the university, which gives rise to achieve deeper and 
faster ideas, which can play a key role in decision making, 
improve the quality of teachers and education, as well as 
accelerate the pace of innovation. 


IV. CONCLUSION 


We have borrowed the idea of this topic from IEEE Explore. 
The survey has been administered for the years 2008 to 2018. 
The first few years had no great advancements in the field 
what so ever, it was only after 2013 amble informations began 
to flow in. A brief summary of the survey are listed below, 
this study mainly deals with the classification of two methods 
namely, frameworks and various tools, with special emphasis 
given to tools. We learn that other than framework there are 
assorted types of tools that has emerged to solve several big 
data visualization problem. 

We have reached to a conclusion that tools are proving 
useful in various fields. Over the years more advanced and 
versatile tools have come into prominence. Such tools are now 
being used in various real time applications and in the near 
future more advanced concepts about such to and in the future 
evolution of advanced concepts about the same could be seen. 
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Abstract—In the field of image analysis image enhancement 
and segmentation plays a crucial role. Texture is also an im- 
portant part of image analysis. Image enhancement is processed 
on an image in order to make more effective for computer to 
process. The segmentation is process of detecting and extracting 
regions. Clustering is used with segmentation for getting visually 
meaningful regions. Each region has same properties. Cluster 
based segmentation is commonly used for segmentation. In 
this paper, various image contrast enhancement techniques for 
low contrast image, survey and different image segmentation 
techniques are reviewed. 

Index Terms—Texture, Segmentation, Image Enhancement, 
Noise Removal, Clustering, Contrast Enhancement, Histogram 
Equalization, Spatial Domain, Frequency Domain,Enhancement 
Techniques 


I. INTRODUCTION 


Image enhancement means improvement of an image ap- 
pearance by increasing dominance of some features or by 
decreasing ambiguity between different regions of the image. 
Image enhancement is to process an image so that the outcome 
will be more suitable than the original image for a particular 
specific application. They are basically divided into two: a) 
Spatial Domain Technique and b) Frequency Domain Tech- 
nique. 

Spatial domain method is a method which has concerned 
with pixels of input images. Spatial technique is used specif- 
ically for directly changing the gray level value of the single 
pixel and thus it will make an overall change for the whole im- 
age. Frequency domain techniques are based on the operation 
of the orthogonal transform of the image and are convenient 
for processing the image in manner of the frequency content. 

Another image enhancement technique is Histogram Equal- 
ization. There are different types of Histogram Equalization 
like a) Adaptive Histogram Equalization and b) Histogram 
Equalization. These techniques are mainly used for improv- 
ing contrast in image by transforming each pixel with a 
transformation function derived from its neighboring region. 
One of the important Histogram Equalization techniques is 
Contrast Limited Adaptive Histogram Equalization (CLAHE). 
It enhances the contrast of the gray scale image by transferring 
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the standard by CLAHE and it work on minor region in the 
image. Improved CLAHE is applicable in X-ray imaging. 

Enhancement techniques improve the visibility of any por- 
tion or feature of the image. So the enhancement techniques 
for under water acoustic images are also presented. Histogram 
Equalization is commonly used for enhancement. But the 
global properties of the image cannot be properly applied 
in a local context and also it produces poor performance in 
detail preservation. Mathematical Morphology (MM) is one of 
the important approaches which used for image enhancement 
problem in digital images. It is a set theory associated with 
different operations. 

Segmentation is another important part of image analysis. 
The segmentation is the process of extracting the regions and it 
is divided into two: a) Supervised and b) Unsupervised. There 
are three main techniques of image texture segmentation[1] 
are edge-based, cluster-based and region-based. Commonly 
used segmentation technique is cluster-based segmentation. 
In cluster-based segmentation, first we extract feature vector 
from the input image. Then feature vector are passed to the 
cluster estimator to find the number of true cluster. Finally the 
true cluster number is fed to then clustering segmenter which 
partitions the input feature vectors into subsets and labels the 
corresponding image subunits. Image segmentation is common 
and particular useful when looking for defects in material or 
pathological cells in a biological tissue. 

This paper is organized into 8 sections. Section 1 gives an 
overview of the paper. Section 2 describes different image 
enhancement techniques. Section 3 describes the importance of 
contrast enhancement in image processing. Section 4 & 5 gives 
brief descriptions about various image enhancement tech- 
niques using transformation domain and filtering techniques. 
Section 6 describes about various segmentation techniques 
whereas section 7 describes in detail about the technique called 
mathematical morphology. Conclusion is made in section 8. 


II. IMAGE ENHANCEMENT TECHNIQUES 


The basic principle of image enhancement is to process an 
image so that the outcome is more suitable than the original 
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image for a particular specific application. As we know 
different methods and techniques used in image processing 
image enhancement has become a hot research. There is no 
specific theory of image enhancement. 

There are basically two types of image enhancement tech- 
niques, 

1) Spatial Domain Technique. 

2) Frequency Domain Technique 


A. Spatial Domain Techniques 


Spatial domain method is a method which has a concern 
with pixels of the input images. The values of the pixel 
are manipulated to attain desired enhancement. Spatial do- 
main techniques like the logarithmic transforms, power law 
transforms, histogram equalization is based on the direct 
manipulation of the pixels in the image. Spatial techniques 
are specifically useful for directly changing the gray level 
values of single pixels and therefore the overall contrast of 
the whole image. But they usually enhance the whole image in 
a uniform manner which in many cases produces undesirable 
results. It is not pragmatic to selectively enhance edges or other 
required information effectively. Techniques like histogram 
equalization are effective in many images. The approaches can 
be classified into two categories: Spatial filter operations and 
Point Processing operation. 


B. Frequency Domain Techniques 


Frequency domain methods are based on the operation of 
the orthogonal transform of the image preferable than the 
image itself. Frequency domain method are convenient for 
processing the image in manner of the frequency content. 
The orthogonal transform of the image has two components, 
magnitude and phase. The magnitude subsists of the frequency 
content of the image. The phase is used to revamp the image 
back to the spatial domain. The Fourier transform, Hartley 
Transform etc. The transform domain permits the operation on 
the frequency content of the poor contrast image, and therefore 
the high frequency content such as edges and other subtle 
knowledge can simply be enhanced. Frequency domain works 
on Fourier transform of an image. 


e Edges and sharp transitions (e.g. Noise) in an image 
donate meaningfully to the high frequency content of the 
Fourier transform. 

e Low frequency contents in the Fourier transform are 
responsible for the common appearance of the image over 
smooth areas. 

e The scheme of the cleanse is easier to visualize in the 
frequency domain. 


UI. CONTRAST ENHANCEMENT TECHNIQUES 


A. Histogram Equalization 


Histogram of an image is concerned with the gray levels. 
Using histograms to decide that given image is whether a dark 
image or light image or low contrast or high contrast image. 
It can be expressed using discrete function as, to an image. It 


is used to increase the visual appearance of an image[6]. This 
technique involves, 

1) Dividing image into segments. 

2) The histogram is applied to find out the pixel intensity 
values for the gray levels and the image has gray levels 
or intensities in the range from 0 to 255. 

3) Histogram Equalization is used to calculate the intensity 
values and make them uniform distribution of pixels to 
get an enhanced image. Thus HE technique is used to 
grow the dynamic range of pixels for the appearance of 
an image. 


(b) Enhanced image for Histogram 


(a) Original Image 


Equalization 
Fig. 1. Histogram equalization 


B. Adaptive Histogram Equalization 


Adaptive Histogram Equalization is used for improving 
contrast in images. It differs from Histogram Equalization by 
adaptive method that computes several histograms and each 
histogram corresponding to a distinct section of an image. 
The contrast of a region of an image will not be sufficiently 
enhanced by Histogram Equalization. AHE improves this 
enhancement by transforming each pixel with a transformation 
function derived from a neighboring region. It is used to over- 
come some limitations of global linear min-max windowing 
method. Thus, it decreases the portion of noise in regions of 
the image. And also AHE have the capacity of revamp the 
contrast of grayscale and color image. 


Before enhancement 


Fig. 2. Adaptive Histogram Equalization 
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C. Bi-Histogram Equalization 


In order to overcome the drawback introduced by the HE 
method described in the previous subsection, a brightness pre- 
serving Bi-HE method was proposed. BBHE method is used to 
decompose the original image into two sub-images, by using 
the image mean gray- level, and then apply the HE method 
on each of the sub images. In bi-histogram equalization the 
histogram of the image is separated into two sub histograms 
based on the mean value of the histogram of the original 
image, the sub-histograms are equalized independently using 
refined histogram equalization, which gives flatter histogram. 


Number of pixels 


0 Zr L-I 


Gray level 


Fig. 3. Bi histogram equalization 


D. Gray Level Grouping (GLG) 


In Gray Level Grouping (GLG), the basic procedure is to 
first group the histogram components of a low-contrast image 
into a proper number of bins according to a selected criterion, 
then redistribute these bins over the gray scale uniformly, and 
finally ungroup the previously grouped gray-levels. 

GLG not only produces results superior to conventional 
contrast enhancement techniques, but is also fully automatic 
in most circumstances, and is applicable to a broad variety of 
images. 


E. Contrast-Limited Adaptive Histogram Equalization 
(CLAHE) 


Enhances the contrast of the grayscale image by trans- 
forming the standards by CLAHE (contrast-limited adaptive 
histogram equalization). It works on the minor regions in the 
image, named tiles, rather than the full image[7]. Wholly tile’s 
contrast is the enhanced, so that the histogram of the output 
region completely matches the histogram specified by the 
spreading criterion. The nearby tiles are then joined by bilinear 
interpolation to remove artificially induced boundaries. The 
contrast, especially in the homogeneous areas, can be partial 
to shun the revamp any noise that might be present in the 
image. One of the examples to show how the CLAHE is being 


Fig. 4. Original Image and Image Enhanced by CLAHE 


widely used in our present world is the Spectral Hybrid Photon 
Counting detectors: A new era in X-ray imaging which is the 
concept of presenting the improved CLAHE. 

Hybrid Photon Counting (HPC) technology is optimally 
suited for low-dose and multi-energy imaging in medical 
applications. The single-photon counting mechanism as well as 
the absence of electronic noise leads to a tremendous reduction 
of patient dose. In addition to that, HPC detectors open 
the path to dual energy and multi-energy (spectral) imaging 
with unsurpassed performance. This technology goes beyond 
conventional dual-energy technology by enabling multi-energy 
imaging in a single shot and without kV switching or dual- 
source techniques. Its key advantages are noise free detec- 
tion achieving the best signal-to-noise ratio at minimal dose, 
Spectral imaging beyond classical dual energy technology, 
High soft tissue contrast due to extreme dynamic range, High 
frame rates, Spectral imaging information can be retrieved 
retrospectively. 

Pre-clinicians and medical researchers strongly benefit from 
our hybrid photon counting (HPC) detectors. Our revolutionary 
technology detects every single X-ray photon and guarantees 
highest sensitivity, best possible image quality, and lowest 
dose. It also helps in Mammography and Digital Breast 
Tomosynthesis (DBT) by clearly revealing the target objects 
(see picture below). These represent the different structures 
or malignancies found when imaging the breast. It helps in 
angiography by eliminating the requirement of requires the 
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acquisition of multiple X-ray images, causing high patient 
dose and motion artifacts with a single shot, enabling dynamic 
image acquisition, reducing motion artifacts and dose. HPC 
detectors from DECTRIS which uses the improved CLAHE 
enable Spectral Computed Tomography (CT) which is an 
increasingly popular technique to reduce CT dose, enhance 
tissue differentiation and acquire quantitative maps of material 
densities or contrast agent concentrations. 

DECTRIS is the leader in X-ray photon counting detection 
technology. Our Hybrid Photon Counting technology has rad- 
ically transformed basic research at synchrotron light sources 
and in X-ray laboratories during the past years. Over 1500 
detector systems have been installed worldwide, proving the 
maturity of this technology. 

The two most widely noticed literature survey is as follows: 


e Rasti et al.[7], In this paper authors have proposed a new 
medical image illumination enhancement and sharpening 
technique. 

e Mithilesh Kumar[8] present the Image Enhancement us- 
ing (CLAHE) method to eliminate the noise which can 
be present in the digital image. 


IV. IMAGE ENHANCEMENT TECHNIQUES USING 
TRANSFORMATION DOMAIN 


In this section, we will briefly review the image enhance- 
ment techniques for enhancing the acoustic images as the 
scene of the seafloor is captured by using special instruments 
such as Sonar which uses sound as a source to capture image. 


A. Curvelet Transform Domain 


The curvelet transformation is a four step process which 
includes sub-band decomposition, smooth partitioning, renor- 
malization, ridgelet analysis. . The sub-band decomposition 
divides the images into low and high frequency sub-bands. 
The low frequency components contain the smooth areas in an 
image and high frequency components consist of horizontal, 
vertical and diagonal edge information. Smooth windows are 
defined as a collection that is localized around the dyadic 
squares. Each square is analyzed using the Orthonormal 
ridgelet system. Then in parallel, threshold is estimated for 
high frequency sub-bands and then, a non-linear mapping 
enhancement is applied. The resultant sub-bands are further 
processed by the inverse curvelet transform and new enhanced 
image is produced. 


B. Wavelet Transform 


The acoustic images are obtained by synthetic aperture 
sonar. The target response and the background reverberation 
noise is separated using a coherence-based wavelet shrinkage 
method. The targets are much more coherent than the sea 
floor sediments, which is used to generate the independent 
looks. Discrete wavelet transformation [7,9] is applied for both 
target look and compliment look. The weights are assigned 
to the wavelet coefficients of the look and inverse wavelet 
transformation is applied in order to obtain the enhanced 
acoustic image. 


C. K-L Transform 


Here a discrete wavelet transformation is applied on acoustic 
image which decomposes the image into four frequency sub- 
bands namely LL, LH, HL, HH. The LL component with 
smooth areas is enhanced using KL transform[9]. The en- 
hanced LL band is added with other high frequency bands 
and given to the inverse wavelet transform in order to produce 
an enhanced acoustic image. 


D. Contourlet Transform 


Contourlet transform is applied for images which are suffer- 
ing from low illuminations or non-uniform lighting conditions, 
to enhance the image. Steps in Contourlet transform involves, 


1) Sub-band decomposition 
2) Directional transform 


In Sub-band decomposition stage, point discontinuities are 
captured using Laplacian pyramid. In Directional transform 
stage, to link point discontinuity into linear structures di- 
rectional filter bank is used. The contourlet transform uses 
simple mathematical transformation function; hence the algo- 
rithm provides fast results. The process did not introduce any 
distortion to the original image. 


V. IMAGE ENHANCEMENT USING FILTERING TECHNIQUES 


When capturing the acoustic image or during the trans- 
mission, noise such as Gaussian, speckle, salt and pepper 
may occur. The different filtering techniques for removing the 
noises are: 


e Mean filter: A linear mean filter is used to smooth the 
image by removing the noise. This technique uses a mask 
which slides over each pixel in the window and computes 
the average of neighboring pixels that are surrounded by 
a target pixel to replace it. 

e Vector Median Filter (VMF) : In the Vector Median 
Filter (VMF), the point with the minimum sum of vector 
differences is used to represent the pixel. 

e Modified Spatial Median Filter (MSMF) : In the 
Modified Spatial Median Filter (MSMF), after the spatial 
depth of each point within the mask is computed, an 
attempt is made to use this information to first decide 
if the masks center point is an uncorrupted point. If the 
determination is made that a point is not corrupted, then 
the point will not be changed. Like this the spatial depth 
of every point within the mask is calculated. 


Some filtering techniques: 


e Homomorphic Filtering: Homomorphic Filtering 
sharpens the image by correcting the non-uniform light- 
ening conditions, in the frequency domain. 

e Anisotropic filtering: The image features are simplified 
by using the anisotrophic filtering. The homogeneous area 
in an image is smoothed by this filter, and edges are 
preserved. 

e Wavelet de-noising by average filter: The noise is 
suppressed by using the average filter. The noise like 
Gaussian is naturally present in the images captured by 
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instruments. The result of the average filter gives the best 
results over other de-noising methods. 


VI. IMAGE SEGMENTATION TECHNIQUES 


Image segmentation aims to classify homogeneous areas 
of an image into one of several types according to their 
perceptual- similarity or dissimilarity. Image segmentation 
shares some common ground with clustering[10] in that both 
try to group elements or objects (defined in whatever way) 
based on their inherent properties. Clustering based image 
segmenter is consists of three parts. First, the feature vectors 
are extracted from the input image for each image subunit 
and presented in an appropriate form. For example, it can 
be intensity (of multispectral band data or a color image), 
range data, or any other features characterizing an image. Then 
the feature vectors are passed to the cluster estimator to find 
the number of true clusters. Finally the true cluster number 
is fed to the clustering segmenter which partitions the input 
feature vectors into subsets and labels the corresponding image 
subunits. 

1) Feature extraction : A clustering segmenter requires fea- 
ture vectors[4] to characterize each image entity so that 
they can be grouped accordingly. AR model parameter 
estimation is the first step in our clustering-based textured 
image segmentation[1,2,3]. A simple and efficient model 
parameter estimation scheme - least-squared error (LSE) 
estimation - is used. The actual feature vector extraction 
is first to divide the image into non overlapping blocks 
of size B x B. These blocks represent the image subunits 
on which the classification takes place. The AR model 
parameters are estimated from each block through the 
LSE method and are then used as feature vectors in the 
subsequent segmentation stage. 

2) True cluster number estimation : For a given set of 
objects, a difficulty is that a clustering algorithm [5] will 
generally converge for any number of clusters, even if 
no clusters actually exist. It is then reasonable to ask: 
if clusters do exist, how many are needed to reflect the 
true nature of the data set? This implies that a criterion, 
or clustering quality measurement, has to be defined so 
that the cluster number which optimizes this measurement 
could be assumed to reflect the number of inherent 
clusters in the data. Cluster number estimation can be 
put under the general heading of cluster validation[5]. 
Existing cluster number estimation techniques generally 
fall into two categories: statistical hypothesis testing 
methods and heuristic methods. Another approach used 
to estimate true cluster number is distance analysis. 

3) Segmentation with contextual information enhancement: 
Contextual information plays an important role in making 
an accurate segmentation of an image. In the conventional 
clustering approach, feature vectors are treated as inde- 
pendent entities. 

Three schemes are proposed here to enhance the clustering 
segmentation by incorporating contextual information: the fist 
and simplest one is to post process the clustering-segmented 


image by using a majority voting technique so that the final 
label for each block is influenced by the outcomes of its 
neighbouring blocks; the second is to modify the feature vector 
of each object participating in the clustering process selectively 
according to those of its neighbours. This will introduce 
contextual information directly at feature vector level. Finally, 
we can define a local criterion function which takes into 
account the current object feature vector and the label- ling of 
its neighbours. The classification of each object is conducted 
iteratively according to this local criterion function. 


VII. MATHEMATICAL MORPHOLOGY 


Morphology used in the context of mathematical morphol- 
ogy as a tool for obtaining image components that are useful 
in the description and representation of region shape, such as 
skeletons, boundaries. The mathematical morphology is a set 
theory. As such, morphology offers a powerful and unified 
approach to various image processing tasks. An essential part 
of the morphological operations is the structuring element used 
to probe the input image and is itself a binary image. There 
are basically two morphological operations known as : 


1) Dilation: Erosion removes pixels on object boundaries 
whereas dilation adds pixels to the boundaries of objects 
in an image. The number of pixels removed or added 
from the objects in an image depends on the shape and 
size of the structuring element used to process the image. 

2) Erosion: The erosion operator has the opposite effect. 
Now the result is only one, if the mask is completely 
within the object. Objects smaller than the mask disap- 
pears completely. The number of pixels removed or added 
from the objects in an image depends on the shape and 
size of the structuring element used to process the image. 


Using the elementary dilation and erosion operations, two 
useful operations were developed to work on the form of 
objects. Closing and opening are two important operators from 
mathematical morphology. They are both derived from the 
fundamental operations of dilation and erosion.The morpho- 
logical opening of an image is erosion followed by dilation, 
using the same structuring element for both operations. The 
related operation, morphological closing of an image, is the 
reverse: it consists of dilation followed by erosion with the 
same structuring element. . Like those operators, they are 
normally applied to binary images, although there are also 
gray level versions. Opening is the dual of closing, i.e. opening 
the foreground pixels with a particular structuring element is 
equivalent to closing the background pixels with the same 
element. When an erosion operation is applied to an image, it 
eliminates the small objects present in the image. Here there 
is a demerit that all the remaining objects in the image shrink 
in size. To avoid this affect, we apply dilation operation for 
the eroded image with the same structuring element. This 
combination of operation is called opening operation. 


VIII. CONCLUSION 


In this paper, the different image contrast enhancement 
techniques are analyzed. Among this CLAHE improves the 
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contrast of sharp details in both bright and dark regions due 
to its window based nature. It produces better results compared 
to global based methods such as histogram equalization. 
Hence it is being widely used in medical fields. Also It has 
been recently proved that transformation techniques such as 
wavelets, curvelets are more suitable for removing speckle 
noises in acoustic images. It is concluded that, in representing 
the image salient features such as edges, lines, curves and 
contours the performance of the contourlet transform is better 
than wavelets for its anisotropy and directionality. We have 
also discussed several segmentation techniques. In this cluster 
based technique is the most commonly used one. 


REFERENCES 


[1] Saka Kezia, I Shanti Prabha, V Vijaya Kumar, “A New Texture Segmen- 
tation Approach for Medical Images”, IJSER, Vol. 4, Iss. 1, January- 
2013. 

[2] M Joseph Prakash, Saka Kezia, I Santi Prabha, V VijayaKumar, “A 
novel approach for texture segmentation based on rotationally invariant 
patterns”, IJCES, Jan 2013. 

[3] M Joseph Prakash, V Vijayakumar, “A new texture based segmentation 
method to extract object from background”, GJCST, Graphics & Vision, 
Vol.12, Iss.15, Dec 2012. 


[4 


[5 


[6 


[7 


[8 


[9 


[10 


Reed T, and Hans Du Buf, I, “A renew of recent texture segmentation 
and feature extraction techniques”, CVGIP, Image Underst., 1993, SI, 
pp.359-372. 

HARTIGAN, J.A.: “A k-means clustering algorithm”, Appl. Stat., 
1979.28, pp. 100-108 

L. M. Jan, F. C. Cheng, C. H. Chang, S. J. Ruan and C. A. Shen, 
“A Power-Saving Histogram Adjustment Algorithm for OLED-Oriented 
Contrast Enhancement,” in Journal of Display Technology, vol. 12, no. 
4, pp.368-375, April 2016. 

P. Rasti, M. Daneshmand, F. Alisinanoglu, C. Ozcinar and G. An- 
barjafari, “Medical image illumination enhancement and sharpening 
by using stationary wavelet transform,’ 2016 24th Signal Processing 
and Communication Application Conference (SIU), Zonguldak, 2016, 
pp. 153-156. 

Mithilesh Kumar and Ashima Rana, “Image Enhancement using Con- 
trast Limited Adaptive Histogram Equalization and Wiener filter,” Inter- 
national Journal of Engineering and Computer Science ISSN: 2319-7242 
Volume 5 Issues 6 June 2016, pp.16977-16979. 

R. Priyadharsini, T. Sree Sharmila, V. Rajendran, “Underwater acoustic 
image enhancement using wavelet and K-L transform,” International 
Conference on Applied and Theoretical Computing and Communication 
Technology, Oct 2015. 

Everitt B, Cluster analysis, Halsted Press, New York, 1974, pp.40-46. 


Chippy V P et al., “A Survey on Image Enhancement and Segmentation Techniques” 


35 


Proceedings of the Vidya Computer Applications Departmental Seminar (VCADS - 2018), 4 - 5 April 2018 
Department of Computer Applications, Vidya Academy of Science & Technology, Thrissur — 680501 
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Abstract—In recent years Computer has an important role 
in our lives. This paper presents a review of the research on 
the technology on students’ life. The interest in media begins 
in childhood through watching cartoons, movies on television. 
Now technology itself become a valuable asset to education. 
The learning process is being enhanced with new, hard to find 
in traditional teaching, means of visualization of encountered 
objects. Research on the impact of computer on technology on 
students’ and adolescence development, Students enjoyment and 
use of technology in home and school, Students enjoyment and 
use of technology in home and school, Technology as a threat for 
development of students’ life, Benefits of students in classroom 
with computer based technology, Parents’ perception on the 
students’ usage of computer technology The paper concludes with 
recommendations for future study in order to better understand 
the growing impact of computer on a student. 

Index Terms—Impact of Computer, Students’ Life, Adoles- 
cence Development 


I. INTRODUCTION 


The time is ripe to assess the impact of technology use 
on student development. Over the past few years, a growing 
number of U.S. households have added electronic games, 
home computers, and the Internet to other technologies the 
telephone, radio, TV, and stereo system that consume chil- 
dren’s time. Furthermore, the Annenberg Public Policy Center 
has reported that among U.S. households with student aged 
8 to 17, 60% had home computers, and student in 61% of 
households with computers had access to Internet services. 
Much of the previous literature shows the limited use and 
range of technologies students experience in the school en- 
vironment. In addition, past research does not address the 
effect of experience with IT in the home and the school on 
students attitudes and motivation towards IT. The focus of this 
study is to examine how primary school student perceive and 
experience the use of computers in the home and the school 
and then to explore the implications of these perceptions and 
experiences for schools and teachers. 

In examining the impact of technology use, we have pri- 
marily looked at two popular applications of the computer, 
including games and the Internet. Because games played on a 
computer are similar to games played on other platforms (e.g., 
stand-alone game sets such as Nintendo and Sega or hand- 
held games, such as Gameboy), we use the term “computer 


Salkala K S 
Assistant Professor of Computer Applications 
Vidya Academy of Science & Technology 
Thrissur-680501 


games” inclusively to refer to all kinds of interactive games 
regardless of platform. Even the distinction between games 
and the Internet is getting blurry as interactive games can 
be played on the Internet. With the expected convergence of 
different media in the near future, assessing the impact of 
computer technology on student will only get more complex 
and challenging. 

With the increased role of home computers in student’s 
lives has come increased concern about how student may be 
affected. Time spent on home computers may displace other 
activities that have more developmental value, and the merit 
of the computer-based activities has also been questioned. 
Surveys of parents suggest that they buy home computers and 
subscribe to Internet access to provide educational opportuni- 
ties for their student and to prepare them for the ’information 
age. Although research on the effects of student’s use of 
home computers is still sketchy and ambiguous, some initial 
indications of positive and negative effects are beginning to 
emerge. 


II. IMPACT OF TECHNOLOGY ON STUDENTS AND 
ADOLESCENCE DEVELOPMENT 


While the research on whether computers are a positive 
influence in student’s lives is mostly sketchy and ambiguous, 
some initial findings are beginning to emerge. This article 
starts with a discussion of the time spent by student on 
computers and the impact of such computer use on other 
activities such as television viewing. Then we review the 
available research on the effects of computer use on student’s 
cognitive and academic skill development, social development 
and relationships, as well as perceptions of reality and violent 
behavior. 


A. Time Spend on Technological Things 


Parents in the Annenberg survey report that students (be- 
tween 2 and 17 years) in homes with computers spend 
approximately 1 h and 37 min a day on computers, including 
video games (Stanger & Gridina, 1999). In the HomeNet 
study, machine records of weekly usage averaged across 
approximately 2 years of data between 1995 and 1998 show 
that among the teens who had access to the Internet at home, 
usage averaged about 3 h/week during weeks when they used 
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it, and over 10% used it more than 16 h/week. students in 
the study were much heavier users of the Internet and all its 
services than were their parents. 

The students used the Internet for schoolwork, for com- 
munication with both local and distant friends, and to have 
fun, especially by finding information related to their interests 
and hobbies. They were also more likely to use the Internet 
to listen to music, play games, and download software. In 
contrast, adults were more likely to use the Internet for 
instrumental purposes such as getting product information, 
purchasing products, or supporting their employment. Teens 
also used the Internet for instrumental purposes, such as doing 
schoolwork and finding educational material. 


B. Computer Games and the Development of Cognitive Skills 


Many computer applications, especially computer games, 
have design features that shift the balance of required 
information-processing, from verbal to visual. The very pop- 
ular action games, which are spatial, iconic, and dynamic, 
have things going on at different locations. The suite of skills 
student develop by playing such games can provide them 
with the training wheels for computer literacy, and can help 
prepare them for science and technology, where more and 
more activity depends on manipulating images on a screen. 
We now summarize the experimental evidence for the role of 
computer games in developing cognitive skills. Although the 
term “cognitive skills“ encompasses a broad array of skills, 
most of the research has focused on components of visual 
intelligence, such as spatial skills and iconic representation. 
These skills are crucial to most video and computer games as 
well as many computer application. Computer hardware and 
software evolve so quickly that most of the published research 
on the cognitive impact of game playing has been done with 
the older generation of arcade games and game systems. 


C. Effects on Social Development and Relationships 


Few studies have examined the effect of children’s time on 
computers on their social skills and friendships. The extant 
research suggests that frequent game players actually meet 
friends outside school more often than less frequent players. 
Similarly, teenagers in the Home Net sample reported that 
keeping up with both local and distant friends was a very 
important use of the Internet for them Interpersonal commu- 
nications via electronic mail (e-mail) were more important to 
them than information acquisition via the web. Although it is 
clear that the Internet is frequently used for social purposes 
by teens, it is not immediately obvious whether these social 
uses add to or diminish teenagers’ stock of social resources. 


III. STUDENTS’ ENJOYMENT AND USE OF TECHNOLOGY 
IN HOME AND SCHOOL 


The results of the study provide further evidence to the 
growing gap between computer experiences in the home and 
school environments. Consistent with other research findings, 
this study found that st madudentse more use of the technology 
at home than at school (Colley, Gale& Harris 1994; Downes, 


1996, 1999; Kirkman, 1993; Selwyn, 1998; Shoffner, 1990). 
There is a growing gap between students experience of com- 
puters in their two environments of home and school in terms 
of frequency of use, type of technology, software used and 
style of interaction 


A. Differences in Computers’ Use at Home and School 


Seventy-seven percent of student regularly used their com- 
puters at home for playing games. Computer game playing 
is the most popular activity on the home computer for both 
sexes. Similar results have been reported in the literature 
(Cunningham, 1994; Durkin, 1995; Fife-Schaw, Breakwell, 
Lee & Spencer, 1986; Kirkman, 1993; Mohamedali, Messer 
and Fletcher, 1987; Sutherland et al., in press; Underwood et 
al., 1994). Kirkman (1993) reported in his study that over 90% 
of students used their computers at home for games. Downes 
(1996, 1999) also reported that the most common activity with 
the computer was playing games. 

Clearly student in the present study perceived the home 
computer as a games machine. When comparing actual com- 
puter experiences with home and school activities the students 
reported that they were happy to use computers at home with 
game playing being the most enjoyable activity. 


B. Negative Impact of School Use of Computer 


The negative attitudes and limited use of school computers 
is further supported by inspection evidence of IT across 
primary schools in the UK (Ofsted, 1998). Pupils were found 
to have limited exposure to IT and one school in five ob- 
served at Key Stage 1 and 2 were not complying at all with 
the requirements of the National Curriculum in information 
technology. The major reasons for the greater incidence of 
unsatisfactory teaching of IT were teachers lack of knowledge 
of the subject and inadequate planning and organization of 
lessons and tasks. 


C. Positive Impact of Home Computer 


Research shows that the positive effect of access to a home 
computer on pupils attitudes is irrefutable; children had more 
favourable attitudes to using the computer at home than at 
school and were equally confident about using computers 
in both the home and school environments. Research also 
suggests that attitudes formed at home dominate pupils’ 
attitudes to computers, and it is these attitudes which are 
carried into the school environment. Those children who are 
confident about using computers at home will be confident 
about using it at school; in the same way it could be that 
those children who disliked computers at school and therefore 
had a negative attitude towards computers at school did so 
because their classroom IT experiences were not as rich as 
their home experiences. Mere experiences with technology 
in the classroom is not enough, the experience has to be 
successful and it has to stretch the user. 
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IV. TECHNOLOGY AS A THREAT FOR DEVELOPMENT OF 
STUDENTS’ LIFE 


There are undoubted benefits of using technology, but you 
cannot forget about many danger, among which is a addiction, 
inability to function in the real world or entering into risky 
relations are becoming more and more common. Importantly 
these dangers refers to a greater degree to student as their 
ability to control their has not yet been developed. 


A. Abuse of Technology 


In the area of education, these risks relate primarily to 
stimulation of the mechanical reflex of acquiring news, re- 
placement of own studies by ready-made elaborations, use of 
summaries, overviews, often uncritical copying of materials 
published on the Internet. In the psychosocial - educational 
sphere it is worthy to note the threat of addiction, inability to 
function in the real world or entering into risky relationships. 
Importantly, these dangers relate particularly to student since 
their ability to control their own behavior has not been 
generated yet. 


B. Lost in Paradise 


Excessive hanging out on the Internet by student and young 
people, significantly reduces social contacts with a group of 
peers and interferes with a process of socialization. Virtual 
friends which become in childhood heroes of movies or video 
games, are often the only characters forming the child’s social 
environment. M. Sajkowska emphasizes that student have a 
strong need to have a friend. The real world requires a certain 
amount of compromise and some conformity in building 
relationships with their peers. 

In such a case, a computer contributes to disturbing or even 
destroying personal relationships, withdrawal from participa- 
tion in the outside world and loss of the ability to direct 
contact. Time spent in solitude increases comparing to time 
spending in social groups, in fact giving a rise to the sense of 
loneliness. To deal with it, we increasingly use the computer 
/ Internet, trying to gain a sense of participation in social life. 
However, we do not notice superficiality and fragility of these 
contacts. 


V. BENEFITS FOR STUDENTS WITH COMPUTER BASED 
TECHNOLOGY 


Today schools face an increasing demand in their attempts 
to ensure that students are well equipped to enter the workforce 
and navigate a complex world. Through this article explores 
the various ways of computer technology can be used to 
improve how and what student learn in classroom. Technology 
can enhance learning by supporting mainly four characters. 
although the classroom tools of blackboards and books that 
shape how learning takes place have changed little over the 
past century, societal demands on what students learn have 
increased dramatically. 

Debate now focuses on identifying and implementing the 
most appropriate and highest priority reforms in the areas of 
curricula, teacher training, student assessment, administration, 


buildings, and safety. The role that technology could or should 
play within this reform movement has yet to be defined. 
Innovations in media technology, including radio, television, 
film, and video, have had only isolated, marginal effects on 
how and what student learn in school, despite early champions 
of their revolutionary educational potential. although computer 
technology is a pervasive and powerful force in society today 
with many proponents of its educational benefits, it is also 
expensive and potentially disruptive or misguided in some of 
its uses and in the end may have only marginal effects. 


A. Learning through Active Engagement 


Learning can be integrated in classroom with or without 
computers. Consider science laboratories, students actively 
engaged in experimental without computers. Constructing pre- 
sentations, multimedia designs are the various ways to engag- 
ing the students. Using technology to engage students more 
actively in learning is not limited to science and mathematics. 
For example, computer-based applications such as desktop 
publishing and desktop video can be used to involve students 
more actively in constructing presentations that reflect their 
understanding and knowledge of various subjects. Although 
previous media technologies generally placed children in the 
role of passive observers, these new technologies make content 
construction much more accessible to students. The students 
participating in the project showed significant gains in task 
engagement and self-confidence measures compared with stu- 
dents enrolled in a more traditional computer class 


B. Learning through Participation in Group 


Performing a task with others provides an opportunity not 
only to limits what others are doing but also to do the tasks 
and make thinking is visible. Through the conversation and 
gestures students and teachers can resolve misunderstanding 
and correct mistakes. Several computer-based applications, 
such as tutorials and drill-and-practice exercises, do engage 
students individually. Some of the most prominent uses of 
computers today are communications oriented and networking 
technologies such as the Internet and digital video permit 
a broad new range of collaborative activities in schools. 
Using technology to promote such collaborative activities can 
enhance the degree to which classrooms are socially active 
and productive and can encourage classroom conversations 
that expand students understanding of the subject. 


C. Learning through Frequent Interactions and Feedback 


Computer tools can engage students for extended periods 
on their own or in small groups. This create more time for 
the teacher to give individual feedback to particular student. 
In some situation computer tools can ensure students perfor- 
mance and provide feedback. Research indicates that computer 
applications such as those described above can be effective 
tools to support learning. One study compared two methods of 
e-mail based coaching. In the first method, tutors generated a 
custom response for each student. In the second, tutors sent the 
student an appropriate boilerplate response Students learning 
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improved significantly and approximately equally using both 
methods, but the boilerplate-based coaching allowed four times 
as many students to have access to a tutor. 


D. Learning through Connections to Real-world Context 


Computer technology can provide students with an excellent 
tool for applying concepts in variety of contexts thereby break- 
ing the artificial isolation of school subject matter from real- 
world situation. Computer technology can provide students 
with an excellent tool for applying concepts in a variety of 
contexts, thereby breaking the artificial isolation of school sub- 
ject matter from real-world situations. For example, through 
the communication features of computer-based technology, 
students have access to the latest scientific data and expedi- 
tions, whether from a National Air and Space Administration’s 
(NASA) mission to Mars, an ongoing archaeological dig in 
Mexico, or a remotely controlled telescope in Hawaii. Further, 
technology can bring unprecedented opportunities for students 
to actively participate in the kind of experimentation, design, 
and reflection that professionals routinely do, with access to 
the same tools professionals use. 


VI. PARENTS’ PERCEPTION ON THE STUDENTS’ USAGE 
OF COMPUTER TECHNOLOGY 


Many families today have access to computers that help 
them with their daily living activities, such as finding employ- 
ment and helping student with schoolwork. With more families 
owning personal computers, questions arise as to the role they 
play in these households. 


A. Parental Influence on Their Children’s Learning 


Parents play an important role in the education of their 
children (Hoover- Dempsey& Sandler, 1997). They are of- 
tentimes the entry point, the initial contact by which young 
children are exposed to the function, purpose, and value of 
a computer, and their attitudes greatly impact those of the 
child (Sanger, 1997). if parents hold favorable perceptions of 
a learning tool, such as computers, then in all likelihood the 
child will incorporate similar attitudes. Thus, a computer can 
be beneficial or detrimental to a young learner depending on 
how it is modeled as a training tool and the attitudes held 
toward it by the parents 


B. Parental Perceptions of Computer Use 


Although more parents today use computers as compared 
to two decades ago (U.S. Department of Labor, 1999) The 
few studies that exist suggest that parents associate computer 
use with academic achievement and job success. For example, 
in one early study, Visser (1987) found that parents desired 
computers as part of their students education and believed that 
with computers, achievement scores would increase. 


VII. CONCLUSION 


Teenagers use the computer more than younger student or 
adults. Use is also greater for boys compared to girls, for 
Whites compared to Black or Hispanic student, and for student 


in households with higher parental income and education. Stu- 
dent still seem to be spending more time watching television 
than using computers, although computer users watch less 
television than non computer users. 

While much of the time on computers is spent alone, mod- 
erate computer use does not negatively impact student’s social 
skills and activities. On the contrary, e-mail and the Internet 
may actually help maintain interpersonal communication and 
sustain social relationships. during use of the computer you 
should remind student to take a break and make them aware of 
the scale of dangers carried by the excessive contact with the 
computer. Young users should be encouraged to moderation 
and prudence, as well as alternative forms of leisure activities. 
Parents, relatives and teachers should pay attention to both the 
time spent by the student in front of the computer, as well as 
the content inherent in both games and the Internet. 

There does seem to be a case for arguing that a range of 
cognitive skills are practiced in computer game playing given 
the sheer number of decisions student make as they weave 
their way through various games. In particular, within the 
rule-governed game environment, they are often involved in 
breaking codes and making decisions at a fast pace. Playing 
these games involves constant visual, aural, and mindful 
concentration. computers and the Internet are widely used 
by student for schoolwork and to obtain information, more 
and better evidence is needed to support the claim that home 
computer use can improve school performance. More research 
is necessary to determine if use of home computers can have 
significant, long-term effects on cognitive skills and academic 
achievement. 
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Abstract—This is paper of investigation of cloud comput- 
ing,keymanagement and challenges in cloud computing.Cloud 
Computing is the practice of using a network of remote servers 
hosted on the Internet to store, manage, and process data, rather 
than a local server or a personal computer. Cloud systems can be 
used to enable data sharing capabilities and this can provide an 
abundant of benefits to the user. The cloud however is susceptible 
to many privacy and security attacks. Proper key management 
leads to more secure and confidential data which can aid secure 
and private sharing of data in the Cloud. Key management is 
anything you do with a key except encryption and decryption 
and covers the creation/deletion of keys, activation/deactivation 
of keys, transportation of keys, storage of keys and so on. Most 
Cloud service providers provide basic key encryption schemes for 
protecting data or may leave it to the user to encrypt their own 
data. Cryptographic key management describes commonly used 
key types, the key states and key management functions. This 
document identifies the cryptographic key management issues 
that arise for the three main cloud service types IaaS, PaaS and 
SaaS due to the distributed nature of IT resources, as well the 
distributed nature of their control, the latter split among multiple 
cloud actors. 

Index Terms—Cloud computing,Key management and Chal- 
lenges in the cloud 


I. INTRODUCTION 


Cloud computing is the technology that uses network con- 
nectivity and central remote servers to store and manage data 
and to run applications. Since its appearance cloud computing 
proved its ROI advantages to managers and has become an 
important business model where computational resources are 
rented to customers by providers. The key term for cloud 
computing is the virtualization technology, not a new one but 
a reinvented” one with the full advantages of the hardware and 
software developments which appeared since its discovery. 

Since its appearance, even if the technology promised great 
advantages to stakeholders, cloud computing spread has been 
limited by the security risks implied by outsourcing data 
to third party infrastructure. Traditional network architecture 
with its local data storing and manipulation mechanisms and 
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corresponding security mechanism has been quickly forgotten 
and data storage, manipulation, and even computer and net- 
working hardware has been moved to the cloud. This poses 
a tremendous new challenge to the cloud security developers 
to ensure the security of data as a typical cloud user doesnt 
know the exact location of their data or the other sources 
of the data collectively stored with theirs and he cant apply 
most of the tradition defense mechanisms. To interact with 
various services in the cloud and to store the data gener- 
ated/processed by those services, several security capabilities 
are required. Based on a core set of features in the three 
common cloud services - Infrastructure as a Service (IaaS), 
Platform as a Service (PaaS) and Software as a Service (SaaS), 
we identify a set of security capabilities needed to exercise 
those features and the cryptographic operations they entail. An 
analysis of the common state of practice of the cryptographic 
operations that provide those security capabilities reveals that 
the management of cryptographic keys takes on an additional 
complexity in cloud environments compared to enterprise IT 
environments due to: (a) difference in ownership (between 
cloud Consumers and cloud Providers) and (b) control of 
infrastructures on which both the Key Management System 
(KMS) and protected resources are located. This document 
identifies the cryptographic key management challenges in the 
context of architectural solutions that are commonly deployed 
to perform those cryptographic operations. 


The objectives of this document are to identify: 


a) The cryptographic key management issues that arise 
due to the distributed nature of IT resources, as well 
the distributed nature of their control, the latter split 
among multiple cloud actors. Furthermore, the pattern 
of distribution varies with the type of service offering - 
Infrastructure as a Service (IaaS), Platform as a Service 
(PaaS) and Software as a Service (SaaS). 

b) The special challenges involved in deploying crypto- 
graphic key management functions that meet the secu- 
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rity requirements of the cloud Consumers, depending 
upon the nature of the service and the type of data 
generated/processed/stored by the service features. 


II. CLOUD COMPUTING AND SECURE DATA SHARING IN 
THE CLOUD 


A. Cloud Computing 


1) Cloud Computing Definition : According to the U.S. 
National Institute of Standards and Technologys (NIST) defini- 
tion, ’cloud computing is a model for enabling convenient, on- 
demand network access to a shared pool of configurable com- 
puting resources (e.g., networks, servers, storage, applications 
and services) that can be rapidly provisioned and released with 
minimal management effort or service provider interaction” [3]. 
NIST provides taxonomy of three service models available to 
cloud consumers: cloud software as a service (SaaS), cloud 
platform as a service (PaaS), and cloud infrastructure as a 
service (IaaS). It also summarizes four deployment models 
describing how the computing infrastructure that delivers these 
services can be shared: private cloud, community cloud, public 
cloud, and hybrid cloud. 

The NIST definition of Cloud Computing: 


i) Infrastructure as a Service (IaaS): The capability 
provided to the Consumer is to provision processing, 
storage, networks, and other fundamental computing 
resources where the Consumer is able to deploy and 
run arbitrary software. 

ii) Platform as a Service (PaaS): The capability provided 
to the Consumer is to deploy Consumer created or 
acquired applications onto the cloud infrastructure that 
are created using programming languages and tools 
supported by the Provider. 

iii) Software as a Service (SaaS): The capability provided 
to the Consumer is to use the Providers applications 
running on a cloud infrastructure. 


Cloud storage delivery models are of four types: 


i) Public cloud: The public cloud is the one in which 
cloud infrastructure services are provided to general 
public or large industry group over internet. In this 
cloud model, the infrastructure is not owned by user 
but by the organization which provides the cloud 
services. 

The storage backup and retrieval services in this model 
are provided without any cost or as subscription or 
based on used basis. Example of Public cloud: 


e Amazon elastic compute cloud (EC2) 

e IBM SmartCloud Enterprise 

e Google AppEngine 

e Windows Azure Services Platform 

ii) Private Cloud: The private cloud is the one in which 

cloud infrastructure is set aside for exclusive use 
by single organization. It is owned, managed and 
operated by organization, third party or combination 
of both. The cloud infrastructure in this model is 
provisioned on the premises of organization but hosted 


in data center owned by third party. In private cloud, 
organizations will have advantages over public cloud 
as it provides greater flexibility of control over cloud 
resources to them. Moreover private cloud is useful in 
the storage applications where in security, latency and 
regulatory issues are of utmost concern. 

iii) Hybrid Cloud: As the name suggests, hybrid cloud 
is the combination of other cloud models viz. public 
cloud, private cloud or community cloud. This model 
takes advantages of all the models which are part of it. 
Hence it will have scalability, cost effectiveness and 
data security all in one model. The disadvantage of 
this model is difficulty in implementing such a storage 
solution. 

iv) Community Cloud: The model type community cloud 
shares the cloud infrastructure across several organiza- 
tions to support specific community having common 
concerns. In this model, cloud infrastructure is pro- 
vided on the premises or at the data center owned by 
third party. This is managed by participating organiza- 
tions or third party. Community cloud takes benefits of 
both public cloud (e.g. minimal shared infrastructure 
costs, pay per use basis billing) as well as private cloud 
„e.g. added privacy level, policy compliance . 

2) Cloud Computing Reference Architecture : Figure 1 
presents an overview of the NIST cloud computing reference 
architecture[4], which describes five major cloud actors. The 
five major participating actors are the Cloud Consumer, Cloud 
Provider, Cloud Broker, Cloud Auditor and Cloud Carrier. 
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Coito ager Configuration Service 
Privacy | Physical Resource Layer Arbitrage 
Impact Audit 
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7 = Interoperability 
a maw | 
Cloud Carrier 


Fig. 1. The Conceptual Reference Model 


Cloud Consumer is an individual or organization that ac- 
quires and uses cloud products and services. The purveyor 
of products and services is the Cloud Provider. Cloud Au- 
ditor is a party that can conduct independent assessment of 
cloud services, information system operations, performance 
and security of the cloud implementation. Cloud Broker is an 
entity that manages the use, performance and delivery of cloud 
services, and negotiates relationships between Cloud Providers 
and Cloud Consumers. Cloud Carrier is an intermediary that 
provides connectivity and transport of cloud services from 
Cloud Providers to Cloud Consumers NIST identifies two 
types of cloud Providers: Primary Provider and Intermediary 
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Provider, and two types of cloud Brokers: Business Broker 
and Technical Broker 


B. Secure Data Sharing In The Cloud 


1) Why Data Sharing Is Important: Data sharing is be- 
coming increasingly important for many users and sometimes 
a crucial requirement, especially for businesses and organiza- 
tions aiming to gain profit. 

Some of the benefits include: 


Higher productivity: Businesses get more work done as 
well as making collaboration with peers much more 
efficient and hence is key to satisfying their business 
goals. Hospitals also benefit from data sharing and this 
has led to the lowering of healthcare costs. Students also 
benefit whenworking on group projects, as they are better 
able to collaborate with members and get work done more 
efficiently. 

More enjoyment: Many people of any age, gender or 
ethnicity can connect with friends, family and colleagues 
to share their experiences in life as well as catch up with 
others via social networking sites such as Facebook or 
MySpace. Employees and enterprise users can share their 
experiences through sites like Yammer. People can also 
share videos on YouTube or photos on Flickr, which can 
provide greater enjoyment with some people. In the past, 
connecting to a loved one in a different country was not 
possible except through letters. Hence social data sharing 
generally provides people with a rich experience as the 
sharing of personal information can provide people with 
deeper and stronger relationships. 

To voice opinions: Some people prefer to share infor- 
mation to the world in order to voice an opinion. Many 
people want to be heard and use social networking sites to 
promote their opinion, which was not possible unless they 
formed protests. People are now using social networking 
sites such as Facebook, Twitter and YouTube to raise 
awareness about real issues in the world. Although, some 
campaigns have led to violent protests, online campaigns 
usually inform people of issues and encourage people to 
help a cause. 


2) Requirements of Data Sharing in the Cloud: To enable 
data sharing in the Cloud, it is imperative that only authorized 
users are able to get access to data stored in the Cloud. We 
summarise the ideal requirements of data sharing in the Cloud 
below. 


The data owner should be able to specify a group of users 
that are allowed to view his/her data. 

Any member of the group should gain access to the data 
anytime without the data owners intervention. 

No other user, other than the data owner and the members 
of the group, should gain access to the data, including the 
Cloud Service Provider. 

The data owner should be able to revoke access to data 
for any member of the group. 

The data owner should be able to add members to the 


group. 


No member of the group should be allowed to revoke 
rights of other members of the group or join new users 
to the group. 

The data owner should be able to specify who has 
read/write permissions on the data owners files. 


We now look at the privacy and security requirement of 
data sharing in the Cloud[2]. Achieving these requirements 
in the Cloud architecture can go a longway to attracting large 
numbers of users to adopting and embracing Cloud technology. 


Data Confidentiality: Unauthorised users (including the 
Cloud), should not be able to access data at any given 
time. Data should remain confidential in transit, at rest 
and on backup media. Only authorised users should be 
able to gain access to data. 

User revocation: When a user is revoked access rights to 
data, that user should not be able to gain access to the 
data at any given time. Ideally, user revocation should not 
affect other authorised users in the group for efficiency 
purposes. 

Scalable and Efficient: Since the number of Cloud users 
tends to be extremely large and at times unpredictable 
as users join and leave, it is imperative that the system 
maintain efficiency as well as be scalability. 

Collusion between entities: When considering data shar- 
ing methodologies in the Cloud, it is vital that even 
when certain entities collude, they should still not be 
able to access any of the datawithout the data owners 
permission. Earlierworks of literature on data sharing did 
not consider this problem, however collusion between 
entities can never be written off as an unlikely event. 


3) Types of Attacks on the Cloud: There are a number 
of types of privacy and security attacks in the Cloud. The 
following contains a summary of the common types of attacks 
that may occur in the Cloud. 


XML SignatureWrapping Attacks : Using different kinds 
ofXMLsignaturewrapping attacks, one can completely 
take over the administrative rights of the Cloud user and 
create, delete, modify images as well as create instances. 
Cross site scripting attacks : Attackers can inject a 
piece of code into web applications to bypass access 
control mechanisms. Researchers found this possible with 
AmazonWeb Services. They were able to gain free access 
to all customer data, authentication data, tokens as well 
as plaintext passwords. 

Flooding Attack Problem : Provided a malicious user 
can send requests to the Cloud, he/she can then easily 
overload the server by creating bogus data requests to 
the Cloud. The attempt is to increase the workload of the 
Cloud servers by consuming lots of resources needlessly. 
Denial-of-Service Attacks : Malicious code is injected 
into the browser to open many windows and as a result 
deny legitimate users access to services. 

Law Enforcement Requests : When the FBI or gov- 
ernment demand a Cloud Service Provider access to 
its data, the Cloud Service Provider is least likely to 
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deny them. Hence, an inherent threat to user privacy and 
confidentiality of data. 

Data Stealing Problem : A term used to describe the 
stealing of a user account and password by any means 
such as through brute-force attacks or overthe- shoulder 
techniques. The privacy and confidentiality of users data 
will be severely breached. A common mechanism to 
prevent such attacks is to include an extra value when 
authenticating. This value can be distributed to the right 
user by SMS and hence mitigate the likelihood of data 
confidentiality issues. 


4) The Need for Key Management in the Cloud: Key 
management is anything you do with a key except encryption 
and decryption[5] and covers the creation/deletion of keys, 
activation/deactivation of keys, transportation of keys, storage 
of keys and so on. Most Cloud service providers provide basic 
key encryption schemes for protecting data or may leave it to 


the 


user to encrypt their own data. 


We discuss the 3 requirements of effective key management 
below. 


Secure key stores : The key stores themselves must be 
protected from malicious users. If a malicious user gains 
access to the keys, they will then be able to access any 
encrypted data the key is corresponded to. Hence the key 
stores themselves must be protected in storage, in transit 
and on backup media. 

Access to key stores : Access to the key stores should 
be limited to the users that have the rights to access data. 
Separation of roles should be used to help control access. 
The entity that uses a given key should not be the entity 
that stores the key. 

Key backup and recoverability : Keys need secure backup 
and recovery solutions. Loss of keys, although effective 
for destroying access to data, can be highly devastating to 
a business and Cloud providers need to ensure that keys 
arent lost through backup and recovery mechanisms. 


III. CRYPTOGRAPHIC KEY MANAGEMENT OVERVIEW 


In this section, we review the two broad categories of 
cryptographic keys, list the most commonly used key types, 
identify the key states and chart the resulting transition 
diagram[1]. 


A. Key Types 


Cryptographic keys fall into two broad categories: 


i) Secret key: A key that is generally used to 1) perform 
encryption/decryption using symmetric cryptographic 
algorithms; and/or 2) to provide data integrity using 
message authentication codes (i.e., Hash based Mes- 
sage Authentication Code or HMAC) or an encryption 
mode of operation that also provides data integrity . 
A secret key is also called a symmetric key, since the 
same key is required for encryption and decryption or 
for integrity value generation and integrity verification. 

ii) Public/Private Key Pair: A pair of mathematically 
related keys used in asymmetric cryptography for 


authentication, digital signature, or key establishment. 
As the name indicates, the private key is used by the 
owner of the key pair, is kept secret, and should be 
protected at all times, while the public key can be 
published and used be the relying party to complete 
the protocol or invert the operations performed with 
the private key. 


From these broad categories one can determine the most 
commonly used key types in a cloud computing environment 
Additional types of keys are: 


i) Public/Private Authentication Key Pair: This key pair 
is used by one party (peer, client or server) to authen- 
ticate to the other party. Its typical use entails com- 
bining a random challenge with the signer-generated 
random number and signing the result for the benefit 
of the challenger who wishes to authenticate the 
private-key holder. Examples of usage include client- 
authenticated Transport Layer Security (TLS), Virtual 
Private Network (VPN) authentication, and smart card- 
based logon. An authentication key pair is generally 
used in a network environment and is generally used 
for long-term use (e.g., up to 3 years) 

ii) Public/Private Signature Key Pair: The private key of 
the key pair is used by one party to digitally sign a 
message/data, while the corresponding public key is 
used to verify the signature. Examples of the usage 
of a signature key pair are signed Secure/Multipart 
Internet Mail Extensions (S/MIME) messages, signed 
electronic documents, and signed code. In some im- 
plementations, a key pair may be used for both au- 
thentication and signature functions. A signature key 
pair is generally used in a network environment and is 
generally used for long-term use (e.g., up to 3 years). 
It may also be used to generate and verify signatures 
on stored data . 


iii) Public/Private Key Establishment Pair: This key pair 


is used to securely establish a key between parties. 
Examples of the use of a key pair for key establish- 
ment are encrypting the symmetric key for S/MIME 
payload encryption/decryption and encrypting the ran- 
dom secret to be sent from a TLS client to a server. It 
is recommended that key establishment key pairs be 
distinct from authentication and signature key pairs. 
However, it is recognized that some devices such as 
web servers use the same key pair for key establish- 
ment and authentication. A key establishment key pair 
is traditionally used in a network environment, but 
some usage for stored data is also seen and can be 
envisioned. A key establishment key pair is generally 
used for a pre-defined period for encryption (e.g., up 
to 3 years), but is used for decryption for as long as 
the confidentiality of the data needs to be protected. 


iv) Symmetric Encryption/Decryption Key: A symmetric 


key is used to encrypt and decrypt data or messages. 
For data-in-transit, a symmetric encryption/decryption 
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vi) 


key may have a short life, typically for each message 
(e.g., S/MIME message) or for each session (for exam- 
ple a TLS session). For stored data, the symmetric life 
of the encryption/decryption key tends to be as long 
as the confidentiality of the data needs to be protected. 
Symmetric Message Authentication Code (MAC) Key: 
A symmetric key is used to provide assurance for the 
integrity of data. There are three techniques used to 
provide this assurance: 


v) 


e use a symmetric encryption algorithm and a MAC 
mode of operation (e.g., CMAC using AES). 

e use a symmetric encryption algorithm and an 
authenticated encryption mode of operation (e.g., 
GCM or CCM using AES) 

e use a hash-based MAC (HMAC). 


For data-in-transit, a symmetric MAC key has a short 
life, typically for a single message or for a single 
session (for example a TLS session). For stored data, 
the life of a symmetric MAC key tends to be for as 
long as the data needs to be protected. Note that when 
authenticated encryption mode is used, the same key is 
used for both MAC and encryption/decryption, since 
both objectives are achieved by invoking a single mode 
of operation. 

Symmetric Key Wrapping Key: A symmetric key is 
used to encrypt a symmetric key or an asymmetric 
private key. A Key Wrapping Key is also called a Key 
Encrypting Key. 


B. Key States 


A symmetric key or public/private key pair can undergo the 
following states. 


Generation: A symmetric key or public/private key pair 
is generated when required. 

Activation: A symmetric key or private key is activated 
when it is required to be used. A public key is activated 
when it is made available or on the date indicated in its 
associated metadata . 

Deactivation: A symmetric key or private key is de- 
activated when it is no longer required for applying 
cryptographic protection to data. Deactivation of these 
keys may be followed by destruction or archival. A public 
key is not deactivated. 

Suspension: A key may be suspended from use for a 
variety of reasons, such as an unknown status of the key 
or due to the key owner being temporarily away. 
Expiration: A key may expire due to the end of its crypto 
period . 

Destruction: A key is destroyed when it is no longer 
needed. 

Archival: A key may be archived when it is no longer 
required for normal use, but may be needed after the 
keys cryptoperiod. 

Revocation: A revocation is explicitly stated with respect 
to public keys; however, the revocation also applies to 


the corresponding private key. Revocation information is 
securely communicated to the relying parties. 


The following is the state diagram for the key states: 


Fig. 2. State Diagram for Key States 


C. Key Management Functions 


The following are the important key management functions: 


Generate Key: The generation of good-quality keys is 
critical to security. 

Generate Domain Parameters: Discrete Logarithm-based 
algorithms require the generation of domain parameters 
prior to the generation of the keys; the keys are generated 
using those domain parameters. 

Bind Key and Metadata: This function provides assurance 
that the key is associated with the correct metadata. 
Bind a Key to an Individual: The identifier of the individ- 
ual or other entity that owns a key is considered as part 
of the keys metadata, but this association is sufficiently 
critical to be listed as a distinct function. 

Activate Key: This function transitions a key to the active 
state. It is often done in conjunction with key generation. 
Deactivate Key: This function is generally done when 
a key is no longer needed for applying cryptographic 
protection. For example, when a key has expired, or is 
replaced by another key. 

Backup Key: A key is backed by the owner, the key 
management infrastructure, or a third party in order to 
reconstitute the key when it is accidentally destroyed or 
otherwise unavailable 

Recover Key: This function is complementary to the key 
backup function and is invoked when the key is unavail- 
able for some reason and is required by the authorized 
parties. 

Modify Metadata: This function is invoked when meta- 
data bound to a key needs to change. 

Rekey: This function is used to replace the existing key 
with a new key. 

Suspend a Key: This function is used to temporarily cease 
the use of a key. 

Restore a Key: This function is used to restore a sus- 
pended key once its secure status is ascertained. 

Revoke a Key: This function is used to inform the relying 
parties to stop using a public key. 
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e Archive a Key: This function is used to store a key in 
long-term storage after it has been deactivated, expired, 
and/or compromised. 

e Destroy a Key: This function is used to zeroize a key 
when it should no longer be used. 

e Manage TA Store: This function is used by the relying 
party to determine what trust anchors to trust for what 


purpose. 
D. Key Management - Generic Security Requirements 


The following are general key management security require- 
ments: 


i) Parties performing key management functions are 
properly authenticated and their authorizations to per- 
form the key management functions for a given key 
are properly verified. 

ii) All key management commands and associated data 
are protected from spoofing, i.e., source authentication 
is performed prior to executing a command. 

iii) All key management commands and associated data 
are protected from undetected, unauthorized modifi- 
cations, i.e., integrity protection is provided. 

iv) Secret and private keys are protected from unautho- 
rized disclosure. 

v) All keys and metadata are protected from spoofing, 
i.e., source authentication is performed prior to ac- 
cessing keys and metadata. 

vi) All keys and metadata are protected from undetected, 

unauthorized modifications, i.e., integrity protection is 

provided. 

When cryptography is used as a protection mechanism 

for any of the above, the security strength of the 

cryptographic mechanism used is at least as strong 
as the security strength required for the keys being 
managed. 


vii) 


IV. KEY MANAGEMENT CHALLENGES 


Key management is the process whereby the keys in a 
cryptosystem are managed, including their generation, ex- 
change, use and disposal. This includes various protocols and 
procedures that aim to keep the keys secure at all times, 
granting access to them only to authorised entities. However, 
key management is the hardest part of cryptography, and often 
the Achilles heel of an otherwise secure system (Schneier, 
1996). Cloud encryption is no exception, and appropriate key 
management mechanisms must be in place in order to ensure 
the security of the data encryption processes[1]. 

When implementing a key management system, one needs 
to consider where the encryption keys should be stored. These 
keys may be stored in one of three main locations : 

Placing keys in an enterprise datacenter When using an 
enterprise datacenter to store the keys, these are maintained 
with high levels of security, ensuring that these are not 
compromised. Such a system does not carry the risk of relying 
on a third party that might potentially be compromised. 

Using SaaS to manage keys Keys may alternatively be stored 


using a SaaS key management solution where the application 
provider will manage and store the keys. This high dependance 
upon the provider raises a number of security concerns, 
including the possibility of having the keys unavailable in case 
of an outage. 

Using IaaS to manage keys A third approach is to use the 
encryption and key management services provided by IaaS. 
Like the SaaS option, such a system results in strong reliance 
upon the provider if the customer decides to allow the provider 
to manage the keys . However, some IaaS providers also 
provide their customers with the opportunity to manage the 
encryption key themselves, resulting in better separation of 
duties. Amazon S3 Storage (Amazon Web Services, 2012) is 
one such example. 

Due to its complexity, key management introduces a number 
of challenges: 


e Key generation and storage 

e Key availability 

e Key disposal and expiration 

e Key management policy 

e Separation of duties 

e Key management interoperability 


A. Inherent Challenges 


e Key Generation and Storage : Generation of keys used 
during encryption needs to be done in a secure manner, 
since if the keys are compromised from the very start, 
then the security of the whole encryption process would 
be at risk. This would then result in a threat to data 
confidentiality. Key management techniques which ensure 
that the generated keys are strong and securely stored 
need to also be in place. 

e Key Availability : Once keys are generated and used to 
encrypt data, one also needs to ensure that these keys are 
available whenever needed, since this would otherwise 
be a threat to availability. This is particularly important 
since if the encryption and decryption keys are lost, the 
data might effectively be rendered inaccessible, resulting 
in temporary or permanent data loss. High availability is, 
therefore, one of the essential considerations in any key 
management system, including that for Cloud encryption. 

e Key Disposal and Expiration : Another challenge associ- 
ated with key management is the disposal and revocation 
of keys once these are no longer required. Procedures 
should be in place, which allow keys to be revoked when 
particular entities who had access to the key should no 
longer maintain this access. Similarly, keys might also 
need to be revoked if they become compromised, pre- 
venting them from being used for further cryptographic 
processes. As with any cryptosystem, keys used for Cloud 
encryption might also expire after a predefined lifetime, 
since keys often have associated with them an expiry 
date to protect against cryptanalysis attacks. This key 
expiration introduces a number of challenges, such as 
how to manage this process and what operations should 
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be performed on data that was encrypted with the expired 
key. 

e Key Management Policy : Key management processes 
should be part and parcel of a holistic security policy that 
is adopted by the Cloud consumer or by the provider. This 
is a challenge in itself, since the policy must be consistent 
or complementary across the different players involved 
in Cloud encryption, something which becomes more 
complicated as the number of entities involved increases. 

e Separation of Duties : When considering Cloud en- 
cryption, the security principle of separation of duties 
should be implemented, whereby key management pro- 
cesses should not be under the Cloud providers respon- 
sibilities.By having the keys and the data fall under 
the responsibility of separate entities, it becomes more 
difficult for an attacker to obtain both a copy of the 
encrypted data and also the keys required to decrypt the 
data, since these are not kept together. This is also true 
in the case of any insider attacks that might originate 
from within the provider. Moreover, separation of duties 
also protects against threats to confidentiality, integrity 
and compliance, and against threats created by having a 
shared Cloud infrastructure. 

e Key Management Interoperability : Interoperability of 
key management solutions is yet another consideration, 
which is of particular importance in the Cloud since 
several entities might be involved in Cloud encryption 
and key management processes. Key management inter- 
operability is a challenge since different providers and 
encryption algorithms typically have different key man- 
agement requirements. Interoperability helps avoid being 
permanently tied to a specific key management provider, 
and it also allows for reuse of existing systems, making 
Cloud management easier and less prone to security risks. 


B. Cryptographic Key Management Challenges in the Cloud 


The secure management of the resources associated with 
cloud services is a critical aspect of cloud computing. Cryp- 
tographic operations form one of the main tasks of secure 
management. Hence, while cloud services provide ubiquitous 
computing, elastic capabilities and self-configurable resources 
at lower costs, they also entail performing several crypto- 
graphic operations (from a cloud Consumer perspective) for 
the following: 

e Secure Interaction of the Cloud Consumer with various 

services and 

e Secure Storage of data generated/processed by those 

services. 


The key management system (KMS) required to support 
cryptographic operations for the above functions can be com- 
plex, due to differences in ownership and control of underlying 
infrastructures on which the KMS and the protected resources 
are located. 

The cryptographic key management issues that arise due to 
the distributed nature of IT resources, as well the distributed 
nature of their control, the latter split among multiple cloud 


actors. Furthermore, the pattern of distribution varies with the 
type of service offering - Infrastructure as a Service (IaaS), 
Platform as a Service (PaaS) and Software as a Service (SaaS). 
The special challenges involved in deploying cryptographic 
key management functions that meet the security requirements 
of the cloud Consumers, depending upon the nature of the 
service and the type of data generated/processed/stored by the 
service features. 

1) Challenges in Cryptographic Operations and Key Man- 
agement for laas: In the IaaS cloud type, the Consumer 
deploys its own computing resources in the form of virtual 
machines (VMs) or leases them from the cloud Provider. 
The leasing option involves checking out pre-built images 
offered by an IaaS cloud Provider. The VM images that are 
checked out must be authenticated to ensure that they are 
from authorized sources and have not been tampered with. 
After a VM is configured, it has to be launched in the cloud 
Providers infrastructure to become a running VM instance. The 
operation of launching the VM and the subsequent lifecycle 
operations on the VM (such as Stop, Pause, Restart, Kill etc) 
are performed by the IaaS cloud Consumer through access 
to the management interface of the Hypervisor. Additionally, 
during operations or the use of cloud services, the IaaS cloud 
Consumer has to interact with running VM instances in a 
secure manner. These three operations checking out a VM, 
performing lifecycle operations (including launching) on a 
VM instance and secure interaction with it - are performed 
by designated service-level administrators of the IaaS cloud 
Consumer. IaaS cloud service security capabilities (SC) that 
enable these operations are: 


e IaaS-SCl 
e IaaS-SC2 
e IaaS-SC3 


For each of the three security capabilities identified above, 
possible key management challenges are presented below that 
are based on known secure functions or protocols. 


i) IaaS-SC1: The ability to authenticate pre-defined VM 
Image Templates made available by a cloud Provider 
for building functional, customized VM instances that 
meet a cloud Consumers needs (Server Authentication 
Mechanism). 

Key Management Challenges: 

The authentication of the VM templates using digital 
signature, cryptographic hash function, or message 
authentication code entails the bootstrapping problem 
and hence, requires a comprehensive security analysis, 
rather than just an examination of the key management 
challenge. 

ii) IaaS-SC2: The ability to authenticate the API calls 
sent by the cloud Consumer to the VM Management 
interface of the cloud Providers Hypervisor environ- 
ment. 

Key Management Challenge: Cloud Consumers need 
to secure the private key of the public/private key pair 
that is used to sign the VM Management commands 
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on their system. 

iii) TaaS-SC3: The ability to secure the communication 

while performing administrative operations on VM 
instances. 
Key Management Challenges: Cloud Consumers need 
to secure the private key of the public/private key pair 
that is used to authenticate themselves, using the best 
enterprise security mechanisms. It is important to note 
that, the Diffie-Hellman keys and the derived session 
keys are ephemeral and generated or calculated on-the- 
fly. Thus, these keys do not require persistent storage, 
and hence, their key management is not an issue. 


The challenges in the secure interaction of the application 
users of IaaS cloud Consumers with IaaS cloud services are: 


i) TIaaS-SC4: The ability to secure the communication 

with application instances running on VM instances 
for application users during cloud service usage. 
Key Management Challenges: The secure session re- 
quires the presence of an asymmetric key pair (private 
and public keys) for a service instance and an optional 
key pair on the client side, as well. 

ii) IaaS-SC5: The ability to securely store static applica- 

tion support data securely. 
Key Management Challenge: Encryption keys (gen- 
erally, symmetric keys) needed for encrypting the 
data at the cloud Consumer site and are under its 
administrative control and can thus be secured using 
enterprise key management solutions. 

iii) TaaS-SC6: The ability to securely store application 

data in a structured form securely. 
Key Management Challenge: Since the IaaS cloud 
Consumer has administrative control of the subscribed 
DBMS instance, it has control over the DEK as well. 
iv) IaaS-SC7: The ability to store unstructured application 
data securely: This operation requires storage-level 
encryption similar to Transparent/External encryption 
and hence, the same key management challenges ap- 
ply. 

2) Challenges in Cryptographic Operations and Key Man- 
agement for PaaS : The objective of a Platform as a Service 
(PaaS) offering is to provide a computational platform and 
the necessary set of application development tools to Con- 
sumers for developing or deploying applications. Although the 
underlying OS platform on which the development tools are 
hosted is known to the Consumer. Consumers interact with 
these tools to develop custom applications. Consumers may 
also need a storage infrastructure to store both supporting data 
and application data for testing the application functionality. 
PaaS cloud service security capabilities (SC) that enable these 
operations are: 


e PaaS-SC1: The ability to set up secure interaction with 
deployed applications and/or development tool instances, 

e PaaS-SC2: The ability to securely store static data (data 
not directly processed by applications), 

e PaaS-SC3: The ability to securely store application data 


in a structured form (e.g., relational form) using a 
Database Management System (DBMS) and 

e PaaS-SC4: The ability to securely store application data 
that is unstructured. 


The operations involved in exercising the above capabilities 
(PaaS-SC1 through PaaS-SC4) are identical to the operations 
involved in exercising capabilities IaaS-SC4 through IaaS- 
SC7, respectively. Therefore, the same cryptographic key 
management challenges apply. 

3) Challenges in Cryptographic Operations and Key Man- 
agement for SaaS: SaaS offerings provide access to applica- 
tions hosted by the cloud Provider. An SaaS cloud Consumer 
would like to interact with these application instances securely 
and exercise the various application features, depending upon 
the set of assigned permissions or by assuming their assigned 
roles . In addition, some SaaS Consumers would also like to 
store the data generated/processed by those applications in an 
encrypted form for the following reasons: 


e to prevent exposure of their corporate data, due to loss 
of the media used by cloud Providers. 

e surreptitious viewing of their data by an SaaS co-tenant 
or by a cloud Provider administrator. 


Though the former feature is provided by the SaaS Providers, 
the second feature (storing data in an encrypted form) currently 
has to be provided entirely by the SaaS Consumer. The typical 
set of security capabilities (whether provided by an SaaS 
service or not) are: 


e SaaS-SC1: The ability to set up secure interaction with 
an application, and 

e The operations involved in exercising the SaaS-SCl 
capability are identical to the operations involved in 
exercising the IaaS-SC4 capability. Therefore, the same 
cryptographic key management challenges applied. 

e SaaS-SC2: The ability to store application data (struc- 
tured or unstructured) in an encrypted form. 


Key Management challenges: The encryption gateway may 
use a single key or different cryptographic keys for en- 
crypting/decrypting different selected fields of the application. 
Irrespective of the number of cryptographic keys used, since 
the encryption gateway resides within the enterprise network 
perimeter, all cryptographic keys are fully under the control 
of the SaaS cloud Consumer and, as such, protected using 
in-house enterprise key management policies and practices 


C. Cloud Model Analysis 


The challenges that relate to Cloud key management need 
to be considered in the light of the particular Cloud implemen- 
tation being used, such as the service and deployment model. 

The challenge of secure key generation and storage applies 
to any Cloud model used. However, diverse models usually 
result in different entities being responsible for these opera- 
tions. In conceptual Model A, being a public SaaS system, the 
encryption keys are generated and kept by the Cloud provider 
or by a third party proxy, seeing that the consumer is not 
involved in the data encryption process. In Model B, which is 
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a hybrid SaaS model, the Cloud consumer can be responsible 
for key generation and storage, especially for the keys needed 
for the encryption of sensitive data that is stored on premise. In 
PaaS and IaaS, on the other hand, key generation and storage 
is usually mainly managed by the Cloud consumer. In Model 
C, which assumes an application that runs on a public PaaS 
infrastructure, application-level encryption keys are managed 
by the consumer application. Similarly, keys used to encrypt 
volumes in the public IaaS Model E, are often also generated 
and kept by the consumer, in order to keep them separate 
from the volume that is stored by the Cloud provider. In fact, 
Cloud consumers might keep these keys on premise if they 
already have the necessary setup, such as an enterprise key 
management system. This is also necessary in the private IaaS 
Model D, where the whole infrastructure is managed by the 
Cloud consumer organisation. Irrespective of which entity is 
responsible for key generation and storage, this is a challenge 
that needs to be addressed when considering key management 
in the Cloud. 

Similarly, key availability needs to be addressed irrespec- 
tive of which Cloud service or deployment model is being 
used. Mechanisms should be in place to ensure that the 
keys are available, even in situations where the primary key 
management services are distributed across different points 
in the Cloud. Similarly, the challenge of key expiration and 
disposal applies in all Cloud models, although it becomes more 
complicated to manage when the keys are not under the control 
of the entity that is providing the encryption service. In SaaS 
Model A, for instance, the Cloud provider not only stores and 
processes the data, but is also responsible for the encryption 
and key management processes. This makes key expiration 
easier to handle, since everything is under the control of a 
single entity. On the other hand, if the different processes are 
performed by different parties, such as in Model E where the 
keys used to encrypt the public IaaS volume can be managed 
by the Cloud consumer, then better communication and policy 
enforcement techniques need to be in place to ensure that 
expired or disposed keys are not used. 

When the Cloud consumer is practically the only entity 
involved in key management, such as in Model D which is a 
private IaaS system, key management policies are also easier 
to manage. However, policy becomes more complicated as 
third parties, including proxies and key escrow services, are 
introduced into the process. For instance, in Model B, the 
keys used to encrypt the data in the hybrid SaaS system 
might need to be shared and distributed between the various 
actors involved. The key management policy, therefore, covers 
various entities and it potentially no longer falls under the 
responsibility of a single organisation, making this more 
complex to manage. 

A further challenge in Cloud key management is separation 
of duties. Whilst separation of duties is desirable, one needs 
to keep in mind that involving more entities in encryption 
and key management introduces more complexity in the pro- 
cesses. This principle of separation of duties poses slightly 
diverse challenges in the various Cloud service models. In 


conceptual Model A, encryption is performed by the public 
SaaS application and, key management would therefore need 
to be separated from the Cloud provider that stores the data. A 
third party proxy that is entrusted with key management and 
encryption can be used, whilst storage will be handled by the 
provider. On the other hand, in IaaS, mechanisms are necessary 
to store the volume encryption keys separately from the Cloud 
infrastructure. In conceptual Model E, being a public IaaS 
model, these keys can possibly be managed by third parties or 
by the consumer using an enterprise key management system, 
while the provider stores the actual encrypted volume. The 
concept of separation of duties does not necessarily apply 
to Model D, which is a private IaaS model. The decision 
on whether or not to outsource key management depends 
on the sensitivity of the data, the degree of trust that the 
organisation has in its employees, and the security posture 
of the organisation. 

The challenge of key management interoperability is also 
influenced by the Cloud deployment model used. For instance, 
in Model B, which is a hybrid SaaS Cloud, there is a strong 
interaction between the private and public components of the 
Cloud infrastructure. In this model, it is possible for the 
Cloud consumer to encrypt the data in the private Cloud 
and transmit this to the SaaS application still in encrypted 
format. Interoperable key management makes it possible for 
the provider to fetch the key and decrypt the data. In a hybrid 
Cloud where multiple entities are involved in key management, 
interoperability therefore becomes more challenging yet neces- 
sary as it helps simplify and streamline the key management 
operations across the various encryption systems. Moreover, 
when the Cloud consumer chooses to trust third parties to 
perform key management duties on its behalf, the use of 
interoperable practices enables the consumer to change with 
ease the entity that is entrusted to perform key management, 
whenever necessary. 

Although “the Cloud may create new key management 
challenges, the principles for choosing between the various 
alternatives remain the same” (Thiemann, 2012) as with any 
cryptosystem. Key management is an essential component 
of Cloud encryption, and it should be carefully considered 
in order to ensure that the encrypted datas confidentiality is 
maintained. Based upon the organisations risk appetite and its 
security requirements, the best key management techniques 
and practices should be selected and adopted by the organisa- 
tion. 


V. CONCLUSION 


This paper contains the overview of cloud computing, secure 
data sharing in the cloud and the cryptographic key challenges 
in the cloud. Data Sharing and Collobaration in the Cloud 
is fast becoming available in the near future as demands 
for data sharing continues to grow rapidly. From the survey 
we understand that Encryption and access control are the 
two primary means for ensuring data confidentiality in any 
IT environment. In situations where encryption is used as a 
data confidentiality assurance measure, the management of 
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cryptographic keys is a critical and challenging security man- 
agement function, especially in large enterprise data centers, 
due to sheer volume and data distribution (in different physical 
and logical storage media), and the consequent number of 
cryptographic keys. This function becomes more complex in 
the case of a cloud environment, where the physical and 
logical control of resources (both computing and networking) 
is split between cloud actors (e.g., Consumers, Providers, and 
Brokers). 

Proper key management in the Cloud can lead to more 
secure and confidential sharing of data in the Cloud. A poor 
key management system can lead to the complete unreliability 
of the Cloud and can also lose trust from its consumers. Hence 
it is imperative that more research needs to done in achieving 
a more robust key management for the Cloud not only to 
attract more consumers and build trust but also to provide 
a foundation for secure and private data sharing in the Cloud. 

Public and private organizations which want to take ad- 
vantage of cloud-based solutions to reduce costs and improve 
business performance have to implement security mechanisms 


in order to secure their data. The basis for this process is 
the encryption process and the critical point in this process is 
the key management solution. Cloud services providers like 
Amazon, Microsoft, IBM, VMware and other companies saw 
the importance of this and developed or adopted different key 
management solutions 
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Abstract—Proxy Re-encryption has been used since the need 
for forwarding an encrypted message to a party for whom it was 
not encrypted was highlighted in the form of delegation rights 
by Blaise, Bleumer and Strauss. Various Proxy Re-Encryption 
schemes have been introduced till today mainly focusing on 
demonstrating features like transitivity and collusion-resistance 
to ensure minimal trust on the proxy and maximum key-privacy. 
This survey highlights some major schemes introduced, classifies 
them based on their directionality, brings to light their major 
advantages and disadvantages, and provides a detailed compar- 
ative study based on the key features a Proxy Re- Encryption 
Scheme must possess in order for its widespread. 

Index Terms—Re-encryption, Decryption, Schemes, Bilinear 
maps, CCA secure, Collusion resistance, Key privacy, CPA 
secure, Delegation rights, Proxy Re-Encryption, Transitivity. 


I. CLOUD COMPUTING 


Systems can be used to enable data sharing capabilities 
and this can provide an abundant of benefits to the user. 
There is currently a push for IT organisations to increase their 
data sharing efforts. According to a survey by Information 
Week,nearly all organisations shared their data somehow with 
74 % sharing their data with customers and 64 % sharing 
with suppliers. A fourth of the surveyed organisations consider 
data sharing a top priority. The benefits organisations can gain 
from data sharing is higher productivity. With multiple users 
from different organisations contributing to data in the Cloud, 
the time and cost will be much less compared to having 
to manually exchange data and hence creating a clutter of 
redundant and possibly out-of-date documents. 


A. Different types of Cloud models 


Cloud computing is typically classified in two ways: 
e Location of the cloud computing (Deployment Model) 
e Type of services offered( Service Model) 
1) Deployment Models: 
i) Private cloud: The computing infrastructure is dedi- 
cated to a particular organization and not shared with 


other organizations. Some experts consider that private 
clouds are not real examples of cloud computing. 
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Private clouds are more expensive and more secure 
when compared to public clouds. 

Private clouds are of two types, On-premise private 
clouds and externally hosted private clouds. Externally 
hosted private clouds are also exclusively used by 
one organization, but are hosted by a third party 
specializing in cloud infrastructure. Externally hosted 
private clouds are cheaper than On-premise private 
clouds. 

Public cloud: In Public cloud the computing infras- 
tructure is hosted by the cloud vendor at the vendors 
premises. The customer has no visibility and control 
over where the computing infrastructure is hosted. 
The computing infrastructure is shared between any 
organizations. 

Hybrid cloud: Organizations may host critical ap- 
plications on private clouds and applications with 
relatively less security concerns on the public cloud. 
The usage of both private and public clouds together is 
called hybrid cloud. A related term is Cloud Bursting. 
In Cloud bursting organization use their own comput- 
ing infrastructure for normal usage, but access to the 
cloud using services like Salesforce cloud computing 
for high/peak load requirements. This ensures that a 
sudden increase in computing requirement is handled 
gracefully. 

Community cloud: It involves sharing of computing 
infrastructure in between organizations of the same 
community. For example, all Government organiza- 
tions within the state of California may share comput- 
ing infrastructure on the cloud to manage data related 
to citizens residing in California. 


2) Service Models: 


i) 


Infrastructure as a service (IaaS): This involves 
offering hardware related services using the principles 
of cloud computing. These could include some kind of 
storage services (database or disk storage) or virtual 
servers. Leading vendors that provide Infrastructure as 
a service are Amazon EC2, Amazon S3, Rackspace 
Cloud Servers and Flexiscale. 


Platform Application 


Infra 
structure 


ii) 


iii) 


Cloud Clients 


Web browser, mobile app, thin client, terminal 
emulator, ... 


SaaS 


CRM, Email, virtual desktop, communication, 
games, ... 


PaaS 


Execution runtime, database, web server, 
development tools, ... 


laaS 


Virtual machines, servers, storage, load 
balancers, network, ... 


Fig. 4. Classification based upon service provided 


Platform as a Service (PaaS): This involves offering 
a development platform on the cloud. Platforms pro- 
vided by different vendors are typically not compati- 
ble. Typical players in PaaS are Googles Application 
Engine, Microsofts Azure, Salesforce.coms force.com 
Software as a service (SaaS): This includes a com- 
plete software offering on the cloud. Users can access 
a software application hosted by the cloud vendor on 
pay-per-use basis. This is a well-established sector. 
The pioneer in this field has been Salesforce.coms 
offering in the online Customer Relationship Manage- 
ment (CRM) space. Other examples are online email 
providers like Googles gmail and Microsofts hotmail, 
Google docs and Microsofts online version of office 
called BPOS (Business Productivity Online Standard 
Suite). 


The above classification is well accepted in the industry. 


David 


Linthicum describes a more granular classification on 


the basis of service provided. These are listed below: 


Storage-as-a-service 
Database-as-a-service 
Information-as-a-service 
Process-as-a-service 
Application-as-a-service 
Platform-as-a-service 
Integration-as-a-service 
Security-as-a-service 
Management/Governance-as-a-service 
Testing-as-a-service 
Infrastructure-as-a-service 
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B. Secure Data Sharing 


Cloud systems can be used to enable data sharing capa- 
bilities and this can provide several benefits to the user and 
organization when the data shared in cloud. Since many users 
from various organizations contribute their data to the Cloud, 
the time and cost will be less compared to manually exchange 
of data. Google Docs provides data sharing capabilities as 
groups of students or teams working on a project can share 
documents and can team up with each other successfully. This 
allows higher productivity compared to previous methods of 
frequently sending updated versions of a document to mem- 
bers of the group via email attachments. People are expecting 
data sharing capability on their computers, phones and laptop 
etc. People love to share their information with others such 
as family, colleagues, friends or the world. Students also get 
benefit when working on group projects, as they are able to 
team up with members and get work done efficiently. 

1) Security requirements for data sharing: The Security 
requirements for data sharing in cloud computing system are 
as follows: 


i) Data security: The provider must ensure that their 
data outsourced to the cloud is secure and the provider 
has to take security measures to protect their informa- 
tion in cloud. 

ii) Privacy : The provider must ensure that all critical 
data are encrypted and that only authorized users have 
access to data in its entirety. The credentials and digital 
identities must be secure as any data that the provider 
collects about customer activity in the cloud. 

iii) Data confidentiality: The cloud users want to make 
sure that their data contents are not made available or 
disclosed to illegal users. Only authorized users can 
access the sensitive data while others should not access 
any information of the data in cloud. 

iv) Fine-grained access control: Data owner can restrict 
the unauthorized users to access the data outsource to 
cloud. The data owner grants different access rights 
to a set of user to access the data, while others not 
allowed to access without permissions. The access 
permission should be controlled only by the owner 
in un-trusted cloud environments. 

v) User revocation: 

When a user gets back the access rights to the data, 
it will not allow any other user to access the data at 
the given time. The user revocation must not affect the 
other authorized users in the group. 

vi) Scalable and Efficient: The number of Cloud users 
is extremely large and the users join and leave un- 
predictably, it is essential that the system maintain 
efficiency as well as scalability. An effective data 
sharing in cloud computing system must satisfy all 
the security requirements. 

Collusion between entities: When considering data 

sharing methodologies in the Cloud, it is vital that 

even when certain entities collude, they should still 


vii) 


not be able to access any of the data without the 
data owners permission. Earlier works of literature on 
data sharing did not consider this problem, however 
collusion between entities can never be written off as 
an unlikely event. 

2) Ideal Requirements of Data Sharing in the Cloud: To 
enable data sharing in the Cloud, it is imperative that only 
authorized users are able to get access to data stored in the 
Cloud. The ideal requirements of data sharing in the Cloud 
below: 

i) The data owner should be able to specify a group of 
users that are allowed to view his/her data. 

ii) Any member of the group should gain access to the 

data anytime without the data owners intervention. 

iii) No other user, other than the data owner and the 

members of the group, should gain access to the data, 
including the Cloud Service Provider. 

iv) The data owner should be able to revoke access to 

data for any member of the group. 

v) The data owner should be able to add members to the 

group. 

vi) No member of the group should be allowed to revoke 
rights of other members of the group or join new users 
to the group. 

The data owner should be able to specify who has 
read/write permissions on the data owners les. 


vii) 


C. Types of Secure Data Sharing 


Traditional approach is mainly based on key management. 
Recent approaches include: 

e Attribute-Based Encryption 

e Proxy Re-encryption 

e Hybrid ABE and PRE 

1) Key Management: Key management is anything you 
do with a key except encryption and decryption and covers 
the creation/deletion of keys, activation/deactivation of keys, 
transportation of keys, storage of keys and so on. Most Cloud 
service providers provide basic key encryption schemes for 
protecting data or may leave it to the user to encrypt their 
own data. 

A Cloud Key Management Infrastructure (CKMI) is pro- 
posed which contains a Cloud Key Management Client 
(CKMC) and Cloud Key Management Server (CKMS). The 
protocol includes objects which contain keys and certificates, 
etc. The operations upon them such as creation, deletion, 
retrieval and updating of keys, certificates, and also attributes 
related to the object in question such as the object identifier. 
The method is effective for proper key management however, 
if the server is broken, all the users data is lost and there is 
no proper backup and recovery mechanism. 

2) Attribute-Based Encryption: Attribute-Based Encryption 
(ABE) is one effective and promising technique that is used 
to provide fine-grained access control to data in the Cloud. 
Initially, access to data in the Cloud was provided through 
Access Control Lists (ACLs) however, this was not scalable 
and only provided coarse-grained access to data. 
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Attribute-Based encryption first proposed by Goyal. Pro- 
vides a more scalable and fine-grained access control to data 
in comparison to ACLs. Attribute-Based Encryption is an 
access control mechanism where a user or a piece of data 
has attributes associated with it. An access control policy is 
defined and if the attributes satisfy the access control policy 
the user should be able to get access to the piece of data. 
There are two kinds of ABE: 

e Key-Policy ABE (KP-ABE) 

e Cipher text-Policy ABE (CP-ABE) 

3) Proxy Re-encryption: Proxy Re-encryption is another 
technique that is fast becoming adopted for enabling secure 
and confidential data sharing and collaboration in the Cloud. 
Proxy Re-encryption allows a semi-trusted proxy with a re- 
encryption key to translate a cipher text under the data owners 
public key into another cipher text that can be decrypted by 
another users secret key. At no stage will the proxy be able 
to access the plaintext. Researchers have utilized proxy re- 
encryption in relation to the Cloud and in particular for secure 
and confidential data sharing and collaboration in the Cloud. 

If a user, say Alice, encrypts her data m, using her public 
key. When she wants to share the data with another user, say 
Bob, she sends the encrypted data to a proxy. The proxy then 
converts the data encrypted under Alices public key into data 
that is encrypted under Bobs public key and sends this to Bob. 
Bob can now use his private key to decrypt the cipher text and 
reveal the contents. 

4) Hybrid ABE and PRE : ABE and Proxy Re-encryption 
have also been used in combination with each other to provide 
extra security and privacy for data sharing and collaboration 
in the Cloud. A number of works in literature are taking 
advantage of combining the power of the two schemes to 
provide a more robust and guarantee further trust in the data 
owner for the secure sharing of data in the Cloud. 


II. PROXY RE- ENCRYPTION 


Proxy re-encryption technique is a one of the recent tech- 
nique, that provide secure and confidential data sharing and 
collaboration. It is initially introduced by Blaze, Bleumer, 
Strauss. Proxy re-encryption is a relatively new data encryp- 
tion technique, devised primarily for distributed data and file 
security. 

The goal of proxy re-encryption is allowing the re- 
encryption of one cipher text to another cipher text without 
relying or trusting the third party that performs the transfer. 
Proxy re-encryption allows a semi-trusted proxy with a re- 
encryption key. It is used to translate cipher text that encrypted 
by the owners secret key into another cipher text. The re- 
encrypted cipher text that can be decrypted by another users 
secret key. At no stage he proxy will never get the plain text. 

Example, When Alice want to send data m to Bob. First, 
Alice encrypt her data using her public key. Then Alice send 
the encrypted data to proxy. After that proxy again encrypt 
the cipher text using Bobs public key, then that is send to 
Bob. Then Bob decrypt the re-encrypted cipher text using his 
private key. 


Owner’s private key is divided into two parts. One half 
is stored in the data owner’s machine, other is stored in the 
Cloud proxy. The data owner encrypts the data with half his 
private key, which then gets encrypted again by the proxy 
using his other half of the key. And also the user’s private 
key is divided into two parts. One half is stored in the user’s 
machine and other half on the Cloud proxy. Then proxy will 
decrypt the cipher text with half the user’s private key in the 
proxy and then decrypt again on the user’s side to retrieve the 
full plain text. When the data owner wishes to revoke a user 
from accessing the data, he simply informs the Cloud proxy 
to remove the user’s key piece. 


A. Algorithm 


A Proxy Re-Encryption (PRE) scheme consists of the 
following algorithms: 


Step 1. param + Setup(1*): on input a system parameter k 
e N, output a set of system parameters param. 

(pki, ski) <_ KeyGen(param): on input param, output 
a public/private key pair (pk;,sk,;). For simplicity, we 
implicitly regard param as an input for the following 
algorithms. 

rkj+; <- ReKeyGen(sk;, pkj): on input a private key 
sk; of a user i, and a public key pk; of another user 
j,output a re-encryption key rk;+; that can be used to 
transform a ciphertext under pk; to another ciphertext 
under pk; , where i,jel,..., poly(1*) and i = j, 
poly(1")is some polynomial in k. 

Ci < Enc(pk;,m): on input a public key pk;, and a 
message m from a message space M(pk;), output a 
ciphertext C;. 

C; + ReEnc(rk;+; , Ci): on input a re-encryption 
key rk,;+; , and a ciphertext C;, output a re-encrypted 
ciphertext C; . 

m/ L + Dec(sk;, C;): on input a private key ski, 
and a ciphertext C;, output a message m or an error 
symbol | indicating C; is invalid. 

m/ L ¢DecR(sk; , Cj): on input a private key 
sk; and a re-encrypted ciphertext Cj , output a mes- 
sage m or an error symbol | indicating C; is invalid. 


Step 2. 


Step 3. 


Step 4. 
Step 5. 
Step 6. 


Step 7. 


The main strength with this scheme is that, it doesnt require 
re-encryption if a users rights are revoked and hence saves 
on computation costs, especially when considering the large 
number of users in groups. And doesnt allow outsiders to view 
the original plain text at any point as the data remains in an 
unreadable format in the cloud. Only users with granted access 
rights can view the original plain text. 

The main problem with this scheme is that of collusion 
attacks; if a revoked user and the proxy collude, thus user 
then has access to the other entire users private key in the 


Bob 
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Fig. 5. Algorithm 
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group. Also, the proxy may suffer from too many encryption 
and decryption operations. 


B. Classification of Proxy Re-encryption Schemes 


1) Type and Identity Based Proxy Re-encryption Scheme : 
This scheme has thrown light on the problem of multiple dele- 
gations of decryption rights. Suppose the delegator wants two 
different users to view different sub parts of his message. The 
solution would be to place trust in the proxy to re-encrypt the 
selective parts of the cipher-texts using this method. This fails 
if the proxy is corrupted. A better but unrealistic alternative is 
choosing a separate pair of keys for each delegate. The type- 
and-identity-based proxy re-encryption scheme is based on the 
Boneh- Franklin Identity Based Encryption scheme enabling 
implementation of different access control policies for cipher- 
texts against multiple receivers. The messages are categorized 
into different types according to the decryption rights of the 
intended receivers. The main benefit of this scheme is the 
single pair of keys which provides re-encryption capability to 
the proxy for his cipher-texts against his receivers. But the 
proposed scheme works only for the cipher-texts generated by 
the sender. The identity based encryption(IBE) scheme consist 
of four algorithms: Setup , Extract, Encrypt and Decrypt . 
Unlike a traditional public key encryption scheme, an IBE 
does not require a digital certificate to certify the encryption 
key (public key) because the public key of any user can be an 
arbitrary string such as an email address, IP address, etc. The 
private key generator(PKG), can generate each users’ private 
key. IBE is a very suitable technique to be used in healthcare 
to exchange emails more securely. For example, in Figure 1, 
when Alice wants to send an encrypted email to Bob.Alice 
can encrypt an email using the encryption key derived from 
Bobs identity and send the email via an insecure channel. Bob 
can authenticate himself to the PKG to get the decryption 
key (private key). After the private key is generated Bob can 


decrypt the encrypted email. Unlike in traditional public-key 
encryption schemes where the private key and the public key 
has to be created simultaneously, in IBE the private key can 
be generated long time after the corresponding public key is 
generated. 

2) Key Private Proxy Re-encryption Scheme : Key Private 
Proxy Re-Encryption also known as Anonymous Proxy Re- 
Encryption introduces the notion of keeping the keys private 
such that even the proxy that performs the transformation of 
message cannot identify or differentiate between the involved 
users. None of the early PRE schemes provided key security. 
This scheme is CPA-secure but work is still in progress 
regarding CCA-safe key private PRE schemes. If a proxy 
communicates with multiple users it should not be able to 
reveal to a user what other parties are communicating with it 
from the message being transmitted or the set of re-encryption 
keys available. This information should not lead to the users. 
The necessity and benefit of a key private scheme is that 
nobody can detect who has access to a certain message i.e. 
complete anonymity of the users involved in a communication. 

3) Ciphertext-Policy Attribute based Proxy Re- encryption 
Scheme : Ciphertext-Policy ABPRE is a joint construction of 
attribute-based encryption and traditional proxy re- encryption 
scheme. It is proven to be secure against CPA. It is a 
type of ABE where the key is associated with an access 
structure namely a group of attributes defining the type of 
user that should be given access and decryption rights. This 
solves the issue of multiple users and key distribution over 
a large audience. Key management creates an overhead in 
such situations and this algorithm is beneficial in this context. 
Recent variations of this algorithm are proven secure against 
chosen ciphertext attacks under decisional q-parallel BDH 
assumption . This algorithm has widespread applications in 
medical domains where patient records are continuously being 
transferred and referred from one doctor or facility to another. 
It provides a fine grained access control to the user over the 
delegates enabling it to specify who can decipher the data or 
message by setting with it a set of attributes . CP ABPRE 
scheme is a collusion resistant uni-directional scheme and is 
associated with a monotonic access structure. A CCA secure 
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version of CP-ABPRE is also constructed in. 

4) Attribute based Proxy Re- encryption Scheme : The 
Attribute based proxy re-encryption schemes provide a better 
option especially when impersonating a user is an active issue. 
Moreover the problem of authentication of a user is easily 
solved by this. Attribute based PRE involves various user 
attributes like city, country, street number, GPS coordinates, or 
any other set of attributes that are predefined while encryption. 
When a user possesses these attributes only then is the decryp- 
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tion of a message possible and allowed. The identification 
of these attributes is based on a certain threshold ie. if the 
attributes of the receiver match the required attribute set by a 
certain degree or level, the decryption access is granted and 
the message can be decrypted by only using these attributes 
and the secret key. So even if a single attribute doesnt meet the 
threshold the whole decryption fails. This is a general scheme 
whose various modifications exist, namely Cipher-Text policy 
attribute based encryption and Key policy attribute based 
encryption which are widely implemented. This mechanism 
is joined with the proxy re-encryption and implemented in 
various categories. 

5) Conditional Proxy Re-encryption Scheme: In situation 
where fine-grained delegation is required requiring fulfillment 
of a predetermined condition, the notion of conditional proxy 
re-encryption (or C- PRE) was introduced, whereby only 
cipher-text satisfying one condition set by Sender is allowed 
to be transformed and then decrypted by receiver. The scheme 
is proven to be CCA-secure. The scheme is now improved 
to work based on multiple conditions rather than one as was 
its initial version. The conditions can be anything specified 
by the involved parties and the construction of the algorithm. 
They can be a set of pre-defined integers, the sending or 
receiving conditions of the parties, the physical location of 
the sender or the receiver. The message to be sent is encrypted 
using the receivers public key and the condition. Similarly to 
decrypt the message the receiver should meet the pre-defined 
conditions. The challenge now remains to construct CCA- 
secure C-PRE schemes with anonymous conditions rather than 
known predefined conditions. 

6) Time/Clock Based Proxy Re-encryption Scheme: In a 
time based re-encryption scheme, each cloud server is allowed 
to independently re-encrypt data automatically in contrast to 
the previous methods where the data was encrypted only 
after receiving a command from the sender. This allows an 
automatic re-encryption of data based on the internal time of 
the cloud servers rather than by manual commands . Every 
piece of data stored in the cloud is associated with a set of 
attributes that define the type of user the data is meant for and 
a time structure which basically specifies the time limit for 
which the data will be accessible to the user. The receiver is 
issued keys that become effective during the specified access 
times, implying that the receiver can decrypt the message 
using only those keys which match the access time. The data 
owner and the Cloud Service Provider share the secret key. 
This key is later used to create sub- keys for the users and 
when re-encrypting the data along with the clock time of 
the system. The algorithm is based on the Bilinear Deffie- 
Hellman assumption like most proxy re-encryption schemes. 
The algorithm operates in the following mechanism. First the 
algorithm is setup by generating the master key, public key and 
defining a universal attribute set from which the individual 
attributes will be late selected. Then the CSP identifies all 
its users and generates secret keys for them based on their 
attribute sets. The data is then encrypted based on the above 
mentioned access structure. Now when a user requests for a 
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certain data, it is re-encrypted with the internal time of the 
system, hence setting up a valid access time for decryption 
by the user.Therefore a user satisfying the access structure i.e. 
the attribute set can successfully attempt decryption if the time 
hasnt expired.In fig 8,Alice and Bob will get the encrypted data 
from the outsource encrypted database cloud,Alice and Bob 
can then decrypt the message using only those keys which 
match the access time. 

7) Threshold Proxy Re-encryption Scheme: There are three 
problems in a decentralized cloud storage system. First, high 
level of traffic between the user and storage servers leads 
to more computation by the user. Second, key management 
becomes a problem for the user because security is broken if 
the users keys are compromise. Thirdly, directly forwarding a 
users messages to another one is not feasible.The proposed 
system is constructed around the proposed scheme named 
Threshold Proxy Re- Encryption. In the beginning the cloud 
storage system stores user details in some database. The user 
needs to get registered in the database, by entering his data 
like user_name, user_gender, user_location, user_password, 
user_birthdate, and user_e-mail address. The user then logs 
into the system using his credentials that were initially regis- 
tered. The file is forwarded contained in a folder along with the 
user and recipients name, a security question for decryption 
access, the file containing the key for decryption and the status 
of the message. The file is transferred using the receivers email 
and public key. After the file is received by the receiver, the 
selected file is downloaded. But before downloading the file, 
he has to download the key file that was sent in the same 
folder. In order to download the key file, receiver has to enter 
the following details like file name, the secure question and 
its answer. Now the key is revealed to the receiver with which 
the message can be downloaded and decrypted. 


III. ANALYSIS 


i) Type Based Proxy Re-Encryption Scheme: This 
scheme provides semantic security and cipher-text pri- 
vacy control but on the other hand encoding operations 
over encrypted messages is not possible limiting its 
widespread use. 

ii) Key-Private Proxy Re-Encryption Scheme: This 
scheme provides security against Chosen cipher-text 
Attack but the privacy proof of this scheme is more 
difficult than Chosen plaintext attack. 

iii) Identity-based Proxy Re-Encryption Scheme:This 
scheme is secure against an adaptive CCA but it is 


difficult to find such constructions for the algorithm 
that are multi-use, efficient and CCA secured. 

iv) Ciphertext Policy Attribute-Based Proxy Re- 
Encryption Scheme:This scheme provides a fine 
grained access control over data by limiting the de- 
cryption writes based on various attributes of the 
receiver but it has an average efficiency and flexibility 
compared to the other schemes. 

v) Conditional Proxy Re-Encryption schemes :This 
scheme provide a very efficient mechanism against 
CCA but it is very difficult to design C-PRE schemes 
that are CCA secure. 

vi) Time based Proxy Re-Encryption Scheme :This 
scheme is a more recent modification of PRE schemes 
which provides a scalable user revocation and reduces 
the workload of data owners. The major disadvantage 
of this scheme is that it require s the effective time 
period to be same for all attributes associated with the 
user. 

Threshold Proxy Re-Encryption Scheme: This 

scheme enables data forwarding efficiently but it re- 

quires very high access control which becomes very 
difficult to provide. 


vii) 


IV. COMPARISON OF PROXY RE-ENCRYPTION SCHEMES 


The following table (see Figure 12) shows a comparative 
study of the PRE schemes discussed above based on the 
properties of directionality, multi-use, transitivity, interactivity, 
security, key-privacy, collusion resistance, and the assumption 
on which the algorithm is built: 


Schemes! Key Type. ene Key- Conditional] Clock- lease [iphertext 
a a i reshold- |Policy- 

— PRE PRE ere (ES based hased PRE PRE 

| Unidirectional Bi Uni [Unt [Unt Uni | Br Uni 

[Bidirectional 

Multiple-use No Yes No No No = Yes 

Tansitivity No No | No | No [No [No No 

| Non-nteractive Yes Yes Yes Yes Yes No Yes 

| Key-pnvate es = | Yes - = = 

Collusion-resistant No No Yes Yes = Yes Yes 

| Fine-qrained Yes Yes = Yes Yes Yes Yes 

Delegation 

| Ciphertext-pnivate Yes Yes Yes Yes Yes Yes Yes 

| Key-pairs T = T = T 

Secure against 'A(if i 

i CPACCA CONE CPA [CCA CPA CPA, CCA 
Ls 3 DEORE single use) - 7 
Ssumption H, Co - ecision 
BDH DBDH DBDH | quotient BDH al q- 
BDH = 
DH 
Fig. 12. Comparison of proxy re-encryption schemes 


V. CONCLUSION 


This paper briefly discusses various proxy re- encryption 
schemes, their general mechanism and implementation and 
also the cloud computing aspects of proxy re-encryption. 
They are then broadly classified based on directionality and 
a comparison is given after analyzing the schemes for traits 
that should be a part of every successful proxy re-encryption 
algorithm.Future work on proxy re-encryption should include 
features of key-privacy and transitivity. Since most schemes 
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are collusion resistant and key-private but an efficient mecha- 
nism also providing transitivity is missing. 
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Abstract—Due to the reasons such as the rapid growth and 
spread of network services,mobile devices and online users on 
the internet leading to remarkable increase in the amount of 
data.Almost every inductry is trying to cope with this huge 
data.Big data phenomenon has began to gain importance,however 
it is not only very difficult to store big data and analyse them with 
traditional applications,but also it has challenging privacy and 
security problems.In this paper we firstly reviewed the concept of 
big data ,challenges,issues of big data ,security and some possible 
techniques to ensure security in hadoop architecture. 


I. INTRODUCTION 


The amount of data in world is growing day by day. Data 
is growing because of use of internet, smart phone and social 
network. Big data is a collection of data sets which is very 
large in size as well as complex. Generally size of the data 
is Petabyte and Exabyte.Traditional database systems is not 
able to capture, store and analyze this large amount of data. 
As the internet is growing, amount of big data continue to 
grow. Big data analytics provide new ways for businesses and 
government to analyze unstructured data. Now a days, Big data 
is one of the most talked topic in IT industry. It is going to 
play important role in future. Big data changes the way that 
data is managed and used. Some of the applications are in 
areas such as healthcare, traffic management, banking, retail, 
education and so on.Organizations are becoming more flexible 
and more open. Big data is an abstract concept. Apart from 
masses of data,it also has some other features, which determine 
the difference between itself and massive data or very big data. 

*Big data is a collection of data sets so large and complex 
that it becomes difficult to processusing on-hand database 
management tools or traditional data processing applications. 
The challenges include capture, curation, storage, search, shar- 
ing, transfer, analysis, andvisualization.’Big Data refers to new 
database management and analytical approaches developed for 
analyzing, storing, and manipulating large or complex data. 
Investments in Big Data include those in human resources 
(e.g., data scientists) and in business and technology solu- 
tions, including database management platforms (e.g., Hadoop, 
IBM/Netezza), analytics and visualization capabilities (e.g., 
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Revolution R) or text-processing and real-time streaming so- 
lutions. 

Big Data refers to datasets whose size are beyond the ability 
of typical database software tools to capture, store, manage and 
analyse. There is no explicit definition of how big a dataset 
should be in order to be considered Big Data. New technology 
has to be in place to manage this Big Data phenomenon. IDC 
defines Big Data technologies as a new generation of technolo- 
gies and architectures designed to extract value economically 
from very large volumes of a wide variety of data by enabling 
high velocity capture, discovery and analysis.Big data is data 
that exceeds the processing capacity of conventional database 
systems. The data is too big, moves too fast, or does not fit 
the structures of existing database architectures. To gain value 
from these data, there must be an alternative way to process 
it. 

IJ. CHARACTERISTICS OF BIG DATA 


The characteristics of the big data depends on the three 
factors which includes Data Velocity, Data Volumn and Data 
Variety . Big Data is not just about the size of data but also 
includes data variety and data velocity. These are the three Vs 
of the Big data. 


Volume 


Fig. 1. 3V’s of Big Data 


Volume It refers to the quantity of data that is generated. 
The Big word in Big data itself defines the volume. At present 
the data existing is in petabytes and is supposed to increase 
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to zettabytes in nearby future. The social networking sites 
existing are themselves producing data in order of terabytes 
everyday and this amount of data is definitely difficult to be 
handled using the existing traditional systems. 

Variety -Data being produced is not of single category 
as it not only includes the traditional data but also the 
semi structured data from various resources like web Pages, 
Web Log Files, social media sites, e-mail, documents, sensor 
devices data both from active passive devices. All this data is 
totally different consisting of raw, structured, semi structured 
and even unstructured data which is difficult to be handled by 
the existing traditional analytic system. 

Velocity - velocity in Big data is a concept which deals 
with the speed of the data coming from various sources. This 
characteristic is not being limited to the speed of incoming 
data but also speed at which the data flows. For example the 
data from the sensor devices would be constantly moving to 
the database store and this amount wont be small enough. Thus 
our traditional systems are not capable enough on performing 
the analytics on the data which is constantly in motion. 

Variability - This is a factor which can be a problem for 
those who are analyse the data. This refers to the inconsistency 
which can be shown by the data at times, thus hampering 
the process of being able to handle and manage the data 
effectively. 

Complexity - Data management can become a very com- 
plex process, especially when large volumes of data come from 
multiple sources. These data need to be linked, connected and 
correlated in order to be able to grasp the information that 
is supposed to be conveyed by these data. This situation, is 
therefore, termed as the complexity of Big Data. 


II. CHALLENGES OF BIG DATA 


The sharply increasing data deluge in the big data era brings 
about huge challenges on data acquisition, storage, manage- 
ment and analysis. Traditional data management and analysis 
systems are based on the relational database management 
system (RDBMS). However, such RDBMSs only apply to 
structured data, other than semi- structured or unstructured 
data. In addition, RDBMSs are increasingly utilizing more and 
more expensive hardware. It is apparently that the traditional 
RDBMSs could not handle the huge volume and heterogeneity 
of big data. The research community has proposed some 
solutions from different perspectives. For example, cloud 
computing is utilized to meet the requirements on infrastruc- 
ture for big data, e.g., cost efficiency, elasticity, and smooth 
upgrading/downgrading. For solutions of permanent storage 
and management of large-scale disordered datasets, distributed 
file systems and NoSQL databases are good choices. Such 
programming frameworks have achieved great success in 
processing clustered tasks, especially for webpage ranking. 
Various big data applications can be developed based on these 
innovative technologies or platforms. Moreover, it is non- 
trivial to deploy the big data analysis systems. 


1) Data representation 


Many datasets have certain levels of heterogeneity in 
type, structure, semantics, organization, granularity, and 
accessibility. Data representation aims to make data more 
meaningful for computer analysis and user interpretation. 
Nevertheless, an improper data representation will reduce 
the value of the original data and may even obstruct 
effective data analysis. Efficient data representation shall 
reflect data structure, class, and type, as well as integrated 
technologies, so as to enable efficient operations on 
different datasets. 

2) Redundancy reduction and data compression 
Generally, there is a high level of redundancy in datasets. 
Redundancy reduction and data compression is effective 
to reduce the indirect cost of the entire system on the 
premise that the potential values of the data are not 
affected. For example, most data generated by sensor 
networks are highly redundant, which may be filtered and 
compressed at orders of magnitude. 

3) Data life cycle management 
Compared with the relatively slow advances of storage 
systems, pervasive sensing and computing are generating 
data at unprecedented rates and scales. We are con- 
fronted with a lot of pressing challenges, one of which 
is that the current storage system could not support 
such massive data. Generally speaking, values hidden 
in big data depend on data freshness. Therefore, a data 
importance principle related to the analytical value should 
be developed to decide which data shall be stored and 
which data shall be discarded. 

4) Analytical mechanism 
The analytical system of big data shall process masses 
of heterogeneous data within a limited time. However, 
traditional RDBMSs are strictly designed with a lack 
of scalability and expandability, which could not meet 
the performance requirements. Non-relational databases 
have shown their unique advantages in the processing of 
unstructured data and started to become mainstream in 
big data analysis. Even so, there are still some prob- 
lems of non-relational databases in their performance 
and particular applications.We shall find a compromising 
solution between RDBMSs and non-relational databases. 
For example, some enterprises have utilized a mixed 
database architecture that integrates the advantages of 
both types of database (e.g., Facebook and Taobao). More 
research is needed on the in-memory database and sample 
data based on approximate analysis . 

5) Data confidentiality 
Most big data service providers or owners at present could 
not effectively maintain and analyze such huge datasets 
because of their limited capacity. They must rely on 
professionals or tools to analyze such data, which increase 
the potential safety risks. For example, the transactional 
dataset generally includes a set of complete operating data 
to drive key business processes. Such data contains details 
of the lowest granularity and some sensitive information 
such as credit card numbers. Therefore, analysis of big 
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data may be delivered to a third party for processing only 
when proper preventive measures are taken to protect 
such sensitive data, to ensure its safety. 

6) Energy management 
The energy consumption of mainframe computing sys- 
tems has drawn much attention from both economy 
and environment perspectives.With the increase of data 
volume and analytical demands, the processing, storage, 
and transmission of big data will inevitably consume 
more and more electric energy. Therefore, system-level 
power consumption control and management mechanism 
shall be established for big data while the expandability 
and accessibility are ensured. 

7) Expendability and scalability 
The analytical system of big data must support present 
and future datasets. The analytical algorithm must be 
able to process increasingly expanding and more complex 
datasets. 


IV. ISSUES OF BIG DATA 


1) Privacy and Security It is the most important issue with 
Big data which is sensitive and includes conceptual, 
technical as well as legal significance. x The personal 
information of a person when combined with external 
large data sets leads to the inference of new facts about 
that person and its possible that these kinds of facts about 
the person are secretive and the person might not want the 
Data Owner to know or any person to know about them. 
x Information regarding the users (people) is collected 
and used in order to add value to the business of the 
organization. This is done by creating insights in their 
lives which they are unaware of. X Another important 
consequence arising would be Social stratification where 
a literate person would be taking advantages of the Big 
data predictive analysis and on the other hand under 
privileged will be easily identified and treated worse. 
x Big Data used by law enforcement will increase the 
chances of certain tagged people to suffer from adverse 
consequences without the ability to fight back or even 
having knowledge that they are being discriminated. 

2) Data Access and Sharing of Information If data is to 
be used to make accurate decisions in time it becomes 
necessary that it should be available in accurate, complete 
and timely manner. This makes the Data management and 
governance process bit complex adding the necessity to 
make Data open and make it available to government 
agencies in standardized manner with standardized APIs, 
metadata and formats thus leading to better decision 
making, business intelligence and productivity improve- 
ments. Expecting sharing of data between companies is 
awkward because of the need to get an edge in business. 
Sharing data about their clients and operations threatens 
the culture of secrecy and competitiveness. 

3) Storage and Processing Issues The storage available is 
not enough for storing the large amount of data which is 
being produced by almost everything: Social Media sites 


are themselves a great contributor along with the sensor 
devices etc. Because of the rigorous demands of the Big 
data on networks, storage and servers outsourcing the 
data to cloud may seem an option. Uploading this large 
amount of data in cloud doesnt solve the problem. Since 
Big data insights require getting all the data collected and 
then linking it in a way to extract important information. 
Terabytes of data will take large amount of time to get 
uploaded in cloud and moreover this data is changing so 
rapidly which will make this data hard to be uploaded 
in real time. At the same time, the cloud’s distributed 
nature is also problematic for Big data analysis. Thus 
the cloud issues with Big Data can be categorized into 
Capacity and Performance issues. The transportation of 
data from storage point to processing point can be avoided 
in two ways. One is to process in the storage place 
only and results can be transferred or transport only that 
data to computation which is important. But both these 
methods would require integrity and provenance of data 
to be maintained. Processing of such large amount of 
data also takes large amount of time. To find suitable 
elements whole of data Set needs to be Scanned which 
is somewhat not possible .Thus Building up indexes right 
in the beginning while collecting and storing the data is a 
good practice and reduces processing time considerably. 


V. BIG DATA SECURITY AND PRIVACY IN HEALTH CARE 


The new wave of digitizing medical records has a seen 
paradigm shift in the healthcare industry. With the ever- 
increasing cost for healthcare and increased health insurance 
premiums,there is a need for proactive healthcare and well- 
ness.As a result healthcare industry is witnessing an increase 
in sheer volume of data in terms of complexity,diversity 
and timeliness.As healthcare experts looks for every possible 
way to lower costs while improving care process,delivery 
managemnt,bigdata emerges as a plausible solution with the 
promise to transform the healthcare industry.This paradigm 
shift from reactive to proactive healthcare can result in an 
overall decrease in healthcare costs and eventually lead to 
economic growth.While the healthcare industry harness the 
power of bigdata,security and privacy issues are at the focal 
point as emerging threats and vulnerabilities continue to grow. 
As healthcare industry explores myriad ways of applying big- 
data analysis from diagnosis,to treatment,to population health 
management,and eventually capital and strategic planning,the 
opportunities are endless.The explosion of the Internet of 
Things and its ability to provide real-time monitoring and 
expedited access to care is one of the driving factors for 
its adoption in healthcare. Adoption of bigdata in healthcare 
significantly increases security and patient privacy concerns.At 
the outset,patient information is stored in datacenters with 
varying levels of security.Bigdata healthcare cloud that hosts 
clinical,financial,social.genomic,physical and phsychological 
data pertaining to patients. I. Data Governance:As the health- 
care industry moves towards a value-based business model 
leveraging healthcare analytics,data governance will be the 
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first step in regulating and managing healthcare data. II. 
Real-time Security Analytics:Analysing security risks and 
predicting threat sources in realtime is of utmost need in 
the burgeoning healthcare industry. II. Privacy-Preserving 
Analytics:In vasion of patient privacy is a growing concern in 
the domain of bigdata analytics.Privacy-preserving encryption 
schemes that allow running prediction algorithms on encrypted 
data while protectiong the identity of apatient is essential for 
driving healthcare analytics. The wider acceptance and use 
of Big Data across the world has given many dimensions to 
the real-time monitoring and it has given opportunities to get 
connected anywhere, anytime, with almost anything in near 
future. 


VI. SECURITY IMPLEMENTATION 


With the advent of Big Data comes the risk of greater 
security breaches as data volumes increase. Many companies 
are still trying to evaluate the potential of Big Data, let alone 
investigate the risks associated with Hadoop and the Cloud. 
In the quest for new ways to house and exploit increasing 
amounts of unstructured data, companies need to ensure they 
have mechanisms in place which allow them to meet gov- 
ernment compliancy regulations for data protection. Concerns 
about the security of stored data represent a significant barrier 
to the widespread adoption of Big Data, and in response, 
a number of companies are emerging with new products 
that secure data in ways which are practically transparent to 
the user.Here we discuss the Hadoop architecture and some 
methods for securing Hadoop. 


A. Hadoop Architecture 


Hadoop is a free, Java-based programming framework that 
supports the processing of large data sets in a distributed com- 
puting environment. Hadoop allows running applications on 
systems with thousands of nodes with thousands of terabytes 
of data . Its distributed file system supports fast data transfer 
rates among nodes and allows the system to continue operating 
uninterrupted at times of node failure. 

Hadoop consists of distributed file system, data storage and 
analytics platforms and a layer that handles parallel computa- 
tion, rate of flow (workflow) and configuration administration 
HDFS runs across the nodes in a Hadoop cluster and together 
connects the file systems on many input and output data nodes 
to make them into one big file system. The present Hadoop 
ecosystem (as shown in fig 2.) consists of the Hadoop kernel, 
Map-Reduce, the Hadoop distributed file system (HDFS) and 
a number of related components such as Apache Hive, HBase, 
Oozie, Pig and Zookeeper and these components are explained 
as below 

e HDFS: A highly faults tolerant distributed file system that 
is responsible for storing data on the clusters. 

e MapReduce: A powerful parallel programming technique 
for distributed processing of vast amount of dataon clus- 
ters. 

e HBase: A column oriented distributed NoSQL database 
for random read/write access. 
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Fig. 2. 


Hadoop Architecture 


e Pig: A high level data programming language for analyz- 
ing data of Hadoop computation. 

e Hive: A data warehousing application that provides a 
SQL like access and relational model. 

e Sqoop: A project for transferring/importing data between 
relational databases and Hadoop. 

e Oozie: An orchestration and workflow management for 
dependent Hadoop jobs. 


Originally Hadoop was developed without security in 
mind,no security model, no authentication of users and ser- 
vices and no data privacy, so anybody could submit arbi- 
trarycode to be executed. Although auditing and authorization 
controls were used in earlier distributions, such access control 
was easily evaded because any user could impersonate any 
other user. Because impersonation was frequent and done by 
most users, the security controls measures that did subsist 
were not very effective. Later authorization and authentication 
was added, but that to have some weakness in it.The Hadoop 
community supports some security features through the cur- 
rent Kerberos implementation, the use of firewalls, and basic 
HDFS permissions and ACLs Kerberos is not a compulsory 
requirement for a Hadoop cluster, making it possible to run 
entire clusters without deploying or implementing any security. 


B. Hadoop security Solution 


Hadoop is a distributed system which allows us to store huge 
amounts of data and processing the data in parallel. Hadoop 
is used as a multi-tenant service and stores sensitive data such 
as personally identifiable information or financial data.The 
Hadoop ecosystem consists of various components. We need 
to secure all the other Hadoop ecosystem components. Here 
we discuss various security techniques for protecting Hadoop 
components. 

1) Authentication 

Authentication is verifying user or system identity access- 
ing the system. Hadoop provides Kerberos as a primary 
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authentication. Initially SASL/GSSAPI was used to im- 
plement Kerberos and mutually authenticate users, their 
applications, and Hadoop services over the RPC connec- 
tions. The Hadoop components support SASL Framework 
i.e. the RPC layer can be changed to support the SASL 
based mutual authentication viz. SASL Digest-MDS5 au- 
thentication or SASL GSSAPI/Kerberos authentication. 
MapReducesupports Kerberos authentication, SASL Di- 
gest MD-5 authentication, and also Delegation token 
authentication on RPC connections. In HDFS commu- 
nications between the NameNode and DataNodes is over 
RPC connection and mutual Kerberos authenticationis 
performed between them [15]. HBase supports SASL 
Kerberos secure client authentication via RPC, HTTP. 
Hive supports Kerberos and LDAP authentication for the 
user authentication and authentication via Apache Knox . 
Pig uses the user credentials to submit the job to Hadoop. 
So there is no need of any additional Kerberos security 
authentication required but before starting . 
Authorization and ACLS 

Authorization is a process of specifying access con- 
trolprivileges for user or system. In Hadoop, access 
controls is implemented by using file-based permissions 
that follow the UNIX permissions model. Access control 
to files in HDFS could be enforced by the NameNode 
based on file permissions and ACLs of users and groups. 
MapReduce provides ACLs for job queues; that define 
which users or groups can submit jobs to a queue and 
change queue properties. Hadoop offers fine-grained au- 
thorization using file permissions in HDFS and resource 
level access control using ACLs for MapReduce and 
coarser grained access control at a service level.HBase 
offers user authorization on tables, column families. The 
user authorization is implemented using coprocessors. 
Coprocessors are like database triggers in HBase . They 
intercept any request to the table before and after, now we 
can use the Project Rhino to extend HBasesupport for cell 
level ACLs. In Hive, authorization is implemented using 
Apache Sentry.Pig provides authorization using ACLs for 
job queues.Although Hadoop can be set up to perform 
access controlvia user and group permissions and Access 
Control Lists(ACLs), this may not be sufficient for every 
organization.Now-a-days many organizations use flexible 
and dynamicaccess control policies based on XACML 
and Attribute-Based Access Control. 

Encryption 

Encryption ensures confidentiality and privacy of user 
information, and it secures the sensitive data in Hadoop. 
Hadoop is a distributed system running on distinct ma- 
chines, which means that data must be transmitted over 
the network on a regular basis, there is an increasing need 
of demand to move sensitive information into the Hadoop 
ecosystem to generate valuable perceptions. The simple 


4) 


authentication and security layer (SASL) authentication 
framework is used for encrypting the data in motion 
in hadoop ecosystem. SASL security gives guarantee of 


the data being exchanged between client and servers 
and make sure that, the data is not readable by a man- 
in-middle.SASL supports various authentication mecha- 
nisms, for example, DIGEST-MD5, CRAM-MD5, etc. 
The data at rest can be protected in two ways: First, 
when file is stored in Hadoop, the complete file can 
be encrypted first and then stored in Hadoop. In this 
approach, the data blocks in each DataNode can’t be 
decrypted until we put all the blocks back and create the 
entire encrypted file. Second, to applying encryption to 
data blocks once they are loaded in Hadoop system. To 
protect data in transit and at rest, encryption and masking 
techniques can be implemented. 

Audit Trails 

Hadoop cluster hosts sensitive information, security of 
this information is utmost important for organizations 
to have a successful secure big data journey. There is 
always a possibility of occurrence of security breaches by 
unintended, unauthorized access or inappropriate access 
by privileged users. HDFS and MapReduce provide base 
audit support. Apache Hive metastore maintains audit 
(who/when) information for Hive interactions. Apache 
Oozie, the workflow engine, provides audit trail for 
services, workflow submission is maintained into Oozie 
log files. Hue also supports audit logs.For those Hadoop 
components which donot provide built-in audit logging, 
we can use audit logs monitoring tools. 


VII. CONCLUSION 


This paper described the big data and it’s importance.To 
accept and adapt to this new technology many challenges and 
issues exist which need to be brought up right in the beginning 
before it is too late .All those issues and challenges have 
been described in this paper.These challenges and issues will 
help the business organization which are moving towards this 
technology for increasing the value of the business to consider 
them right in the beginning and to find the ways to counter 
them.Hadoop tools for big data is described in detail focussing 
on the areas where it need to be improved so that in future 
big data can have technology as well as skills to work with. 
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Abstract—It is the investigation of Watermarking Technique 
in daily life.It is the need of a people now a day as the alternate 
of paper media. As the technology grown up digital media 
required protection while transferring through internet or others 
mediums. Watermarking techniques have been developed to fulfill 
this requirement. This paper aims to provide a detailed survey 
of all watermarking techniques specially focuses on image water- 
marking types and its applications in todays world. Intellectual 
property (IP) block reuse is essential for facilitating the design 
process of system-ona-chip. Sharing IP designs poses significant 
high security risks. Recently, digital watermarking emerged 
as a candidate solution for copyright protection of IP blocks. 
This paper incorporate the detail study watermarking definition, 
concept and the main contributions in this field such as categories 
of watermarking process that tell which watermarking method 
should be used. It starts with overview, classification, features, 
framework, techniques, application, challenges, limitations and 
performance metric of watermarking and a comparative analysis 
of some major watermarking techniques. 

Index Terms—Image_ watermarking,Ip watermarking,video 
water marking,Attacks,applications.. 


I. INTRODUCTION 


Todays generation is witness of developments of digital 
media. A very simplest example of digital media is a photo 
captured by phone camera. The use of Digital media is 
common in present era. Other example of Digital media is text, 
audio, video etc. We know an internet is the fastest medium of 
transferring data to any place in a world. As this technology 
grown up the threat of piracy and copyright very obvious 
thought is in owners mind. So Watermarking is a process of 
secure data from these threats, in which owner identification 
(watermark) is merged with the digital media at the sender 
end and at the receiver end this owner identification is used 
to recognize the authentication of data. This technique can be 
applied to all digital media types such as image, audio, video 
and documents. 

From many years researchers and developers worked in this 
area to gain best results. The wide availability of reusable 
virtual components or intellectual property blocks (IPs) are 
most effective when it comes to reducing cost and development 
time of SOC designs. A watermark is a digital data embedded 
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in multimedia objects such that the watermark can be detected 
or extracted at later times in order to make an assertion about 
the object. The main purpose of digital watermarking is to 
embed information imperceptibly and robustly in the host data. 
Typically the watermark contains information about the origin, 
ownership,destination, copy control, transaction etc. Potential 
applications of digital watermarking include transaction track- 
ing, copy control, authentication, legacy system application 
based products. Among various available standards H.264 
/ Advanced Video Codec (AVC) is becoming an important 
alternative regarding reduced band width, better image quality 
in terms of peak-signal-to-noise-ratio (PSNR) and network 
friendliness , but it requires higher computational complexity. 

In general, watermarks can be classified in robust or fragile: 
a digital watermark is fragile if it fails to be detectable 
after the slightest modification, and is robust if it resists a 
designated class of transformation. Depending on the applica- 
tion requirements, a fragile or robust watermarking technique 
can be appropriate. If the desired behavior is the integrity 
proof (tamper detection), then a fragile watermark is enough; 
whereas, if a watermark is used to carry copyright notices and 
prevent unauthorized copies, it is important that it is robust, 
and can survive the many attacks that may be thrown at it to 
eschew theft detection and prosecution . 

The term digital watermarking was first appeared in 1993, 
when Tirkel presented two watermarking techniques to hide 
the watermark data in the images . The success of the Inter- 
net, cost-effective and popular digital recording and storage 
devices, and the promise of higher bandwidth and quality of 
service for both wired and wireless networks have made it 
possible to create, replicate, transmit, and distribute digital 
content in an effortless way. The protection and enforcement 
of intellectual property rights for digital media has become 
an important issue . Digital watermarking is that technology 
that provides and ensures security, data authentication and 
copyright protection to the digital media. Digital watermarking 
is the embedding of signal, secret information (i.e. Watermark) 
into the digital media such as image, audio and video. Later the 
embedded information is detected and extracted out to reveal 
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the real owner/identity of the digital media. Watermarking is 
used for following reasons, Proof of Ownership (copyrights 
and IP protection), Copying Prevention, Broadcast Monitoring, 
Authentication, Data Hiding. Watermarking consists of two 
modules watermark embedding module and watermark detec- 
tion and extraction module. Digital watermarking technology 
has many applications in protection, certification, distribution, 
anti-counterfeit of the digital media and label of the user 
information. It has become a very important study area in 
information hiding. This paper analyzes the key technologies 
of digital watermarking and explores the application in the 
digital image copyright protection. 


II. IMAGE WATERMARKING 


Image Watermarking is the technique of embedding of 
owner copyright identification with the host image. When and 
how watermarking is used first is the topic of discussion but 
it can used at Bologna, Italy in 1282 .at first it is used in 
paper mills as paper mark of company . Then it is common in 
practice up to 20th century. After that watermark also used in 
the postage stamp and currency notes of any country. Digital 
image watermarking is actually derive from Steganography, a 
process in which digital content is hide with the other content 
for secure transmission of Digital data. In particular conditions 
steganography and watermarking are very similar when the 
data to be secure is hidden in process of transmission over 
some carrier. The main difference between these two processes 
is in steganography the hidden data is on highest priority for 
sender and receiver but in watermarking bot source image and 
hidden image, signature or data is on highest priority. 


DFT 
Obtained for 
cach color of 

the RGB model 


Input Image: I 
Watermark: (6,3,2,4,5,1) 


Watermarked Image: Iw 


Fig. 1. Image Watermarking 


A. Process of image watermarking 
The process of watermarking is divided into two parts: 


i) Embedding of watermark into host image. 
ii) Extraction of watermark from image. 


1) Watermarking Embedding: The process of image water- 
marking is done at the source end. In this process watermark 
is embedding in the host image by using any watermarking 
algorithm or process. 

2) Watermarking Extraction: This is the process of Extract- 
ing watermark from the watermarked image by reverse the 
embedding algorithm. 


B. Watermarking Properties 


Watermarking need some desirable properties based on the 
application of the watermarking system. 


e Effectiveness: This is the most important property of 
watermark that the watermark should be effective means 
it should surely be detective. If this will not happened the 
goal of the watermarking is not fulfilled. 

e Host Signal Quality: This is also important property of 
watermarking. Everybody knows that in watermarking, 
watermark is embedded in host signal (image, video, au- 
dio etc.). This may put an effect on the host signal. So the 
watermarking system should be like as, it will minimum 
changes the host signal and it should be unnoticeable 
when watermark is invisible. 

e Watermark Size: Watermark is often use to owner iden- 
tification or security confirmation of host signal and it 
always use when data is transmitted. So it is important 
that the size of watermark should be minimum because 
it will increases the size of data to be transmitted. 

e Robustness: Robustness is crucial property for all water- 
marking systems. There are so many causes by which wa- 
termark is degraded, altered during transmission, attacked 
by hackers in paid media applications. So watermark 
should robust, So that it withstand against all the attacks 
and threats. 


C. Classification 


Digital watermarking techniques are classified into various 
types. This classification based on several criteria. 

e Watermark Type 

e Robustness 

e Domain 

e Perceptivity 

e Host Data 

e Data Extraction 


D. Techniques 


In the image watermarking domain based techniques is 
generally used. They are spatial domain and transfer domain. 
But transfer domain techniques are more used compared to 
spatial domain.In the transfer domain technique the coeffi- 
cients of transfer domain are modified of Digital Image not 
like as the pixels values which is changed in spatial domain. 
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Reverse process will be used to extract the watermark from 
watermarked image. 
Some of the main transfer Domain techniques are: 


e Discrete Cosine Transform 
e Discrete Wavelet Transform 
e Discrete Fourier Transform 


Authentication is the process of identify that the received 
content or data should be exact as it was sent. There should 
be no tampering done with it. So for that purpose sender em- 
bedded the digital watermark with the host data and it would 
be extracted at the receivers end and verified. Example like 
as CRC (cyclic redundancy check) or parity check. Anyone 
can use individual transform techniques for watermarking but 
recently combination of these techniques are also used by 
researchers. By these combinations developers can used best 
features of any individual technique. 


E. Applications 


Watermarking technologies is applied in every digital media 
whereas security and owner identification is needed[3]. A few 
most common applications are listed hereby. 


e Owner Identification: The application of watermarking 
to which he developed is to identify the owner of any 
media. Some paper watermark is easily removed by some 
small exercise of attackers. So the digital watermark was 
introduced. In that the watermark is the internal part of 
digital media so that it cannot be easily detected and 
removed. 

e Copy Protection: Illegal copying is also prevent by wa- 
termarking with copy protect bit. This protection requires 
copying devices to be integrated with the watermark 
detecting circuitry. 

e Broadcast Monitoring: Broadcasting of TV channels and 
radio news is also monitoring by watermarking. It is 
generally done with the Paid media like sports broadcast 
or news broadcast. 

e Medical Application: Medical media and documents also 
digitally verified, having the information of patient and 
the visiting doctors. These watermarks can be both visible 
and invisible. This watermarking helps doctors and med- 
ical applications to verify that the reports are not edited 
by illegal means. 

e Fingerprinting: assigned a unique identification by storing 
some digital information in it in the form of watermark. 
Detecting the watermark from any illegal copy can lead 
to the identification of the person who has leaked the 
original content. In cinema halls the movies are played 
digitally through satellite which has the watermark having 
theater identification so if theater identification detected 
from a pirated copy then action against a theater can be 
taken. 

e Data Authentication: Authentication is the process of 
identify that the received content or data should be exact 
as it was sent. There should be no tampering done 
with it. So for that purpose sender embedded the digital 


watermark with the host data and it would be extracted 
at the receivers end and verified. Example like as CRC 
(cyclic redundancy check) or parity check. Authentication 
is the process of identify that the received content or 
data should be exact as it was sent. There should be 
no tampering done with it. So for that purpose sender 
embedded the digital watermark with the host data and 
it would be extracted at the receivers end and verified. 
Example like as CRC (cyclic redundancy check) or parity 
check. 


IHI. IP WATERMARKING 


Incremental changes to current design methodologies are 
inadequate for enabling full potential System-on-a-chip (SOC) 
implementation. The wide availability of reusable virtual 
components or intellectual property blocks (IPs) are most 
effective when it comes to reducing cost and development 
time of SOC designs. Sharing IP designs poses significant 
high security risks. Most of these IPs need time and effort 
to be designed and verified, yet they can be easily copied, or 
modified to cover the authorship proof. Creators and owners 
of IP designs want assurances that their content will not be 
illegally redistributed, and consumers want assurances that the 
content they buy is legitimate. Watermarking techniques were 
widely used throughout history, for copyright protection as 
well as data hiding. 
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Fig. 2. IP Watermarking 


A. Levels 


IP blocks are delivered in three main flavors depending 
on price, applications, and contracts between companies. The 
Virtual Socket Interface (VSD) architecture document describes 
such levels as: 


e Soft IPs : are delivered in the form of synthesizable 
hardware design language (HDL) code. They have the 
advantage of being more flexible and the disadvantage 
of not being as predictable in terms of performance (i.e., 
timing, area, power). Soft IPs typically have increased 
intellectual property risks because RTL (register transfer 
level) source code is required by the integrator 

e Firm IPs : are optimized in structure and topology for 
performance and area through floor planning/placement, 
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possibly using a generic technology library. Firm IPs offer 
a compromise between soft and hard. More flexible and 
portable than hard, yet more predictive of performance 
and area than soft. Firm IPs include a combination of 
synthesizable RTL, reference technology library, detailed 
floor-plan, and a full or partial netlist. Firm IPs do not 
include routing. Risks are equivalent to those of soft IPs 
if RTL is included and are less if it is not. 

e Hard IPs : are optimized for power, size, or performance 
and mapped to a specific technology. Examples include 
netlists that are fully placed, and routed, or optimized 
custom physical layout. They have the advantage of 
being much more predictable, but consequently are less 
flexible and portable due to process dependencies. Hard 
IPs require, at a minimum, a high level behavioral model, 
a test list, full physical and timing models along with the 
final layout. The ability to protect hard IPs is much better 
because of copyright facilities and there is no requirement 
for an RTL code. 


The VSI Alliance IP protection development working group 
identifies three main approaches to secure IPs. First, a deterrent 
approach, where the owner uses legal means trying to stop 
attempts for illegal distribution, i.e., using patents, copyrights 
and trade secrets. Second, a protection approach, where the 
owner tries to prevent the unauthorized usage of the IP 
physically by license agreements and encryption. Protection 
techniques, mostly based on model encryption or distributed 
environment fall short in securing designs or track them in 
case they are stolen or reused without permission. For such 
reasons, a third detection approach was introduced, where the 
owner detects and traces both legal and illegal usages of the 
designs as in watermarking or fingerprinting. This tracking 
should be strong enough to be considered as evidence in front 
of a court if needed. The VSI alliance proposed the usage of 
the three approaches for proper protection of IP designs. 

In this paper, we outline IP watermarking and fingerprinting 
techniques for copyright protection, surveying the current 
state-of-art of IP digital watermarking research. In order to 
evaluate the described techniques, we also defined several 
evaluation criteria. Finally, we highlight the main technical 
problems that must be solved before digital watermarking can 
be widely used. The rest of the paper is organized as follows: 
It describes briefly the preliminaries of digital watermarking 
that we found important to help with reading the rest of the 
paper. In this,we introduce the evaluation criteria developed, as 
well as different attack classes that a watermarking technique 
might face. It overviews the stateof- the-art of approaches used 
for IP watermarking. It describes the main advantages and 
disadvantages of each technique, trying to evaluate them using 
the criteria defined above. Finally, it extracts the main guide- 
lines and conclusions for future IP watermarking research 
directions. 

Steganography is divided into three main application classes 


e linformation hiding : which utilizes the secrecy and 
undetectability of steganography to transfer secret data, 


used mainly for espionage applications. 

e Content verification applications (authentication) : where 
a fragile watermark is introduced to secure the contents 
integrity. 

e [intellectual property protection applications : where the 
watermark is mainly used to convey the information 
about content ownership and intellectual property rights. 
Copyright marking (widely known as watermarking), as 
opposed to steganography, has the additional requirement 
of robustness against possible attacks. Robust watermark- 
ing has the property of being infeasible to remove them 
or make them useless without destroying the object at the 
same time. This means that usually it has to be embedded 
in the most perceptually significant components of the 
object. 


B. IP Watermarking Evaluation Criteria 


Petitcolas identified a set of measures for watermark eval- 
uation. Although these measures were developed mainly for 
multimedia applications, we find some of them to be essential 
while evaluating any IP watermarking techniques. Based on 
these points and the specific needs of hardware and SOC 
design, we defined a set of requirements, which any IP 
watermarking approach should satisfy: 

1) Relying on the Secrecy of the Algorithm: According to 
one of the oldest defined security rules, defined by Kerckhoffs 
in 1883, any encryption or security technique should not rely 
on the secrecy of the algorithm, but to the mathematical 
complexity of such algorithm, The system must not require 
secrecy and can be stolen by the enemy without causing 
trouble. The approach should not depend on the secrecy of 
neither the watermarking insertion nor extraction algorithms. 
The algorithm should instead depend on one of the system 
properties to protect the authorship data. 

2) Level of Reliability : This is a very important measure, 
which can be divided into two main aspects: (1) robustness, 
which measures the strength of the hidden mark against 
attacks, and the percentage of undetected watermarked de- 
sign that might appear; and (2) false positive, which occurs 
whenever the detector could find a mark in a nonwatermarked 
design. Both measures are related to attack analysis and will 
be discussed in details in the next subsection. 

3) Affecting the Design Functionality: Testing and ver- 
ification of hardware systems is an extremely complicated 
task. In order to introduce a watermark to the system, the 
watermarking technique should be totally sound in the sense 
of its effect on the system behavior.Watermarking techniques 
should prove their soundness against such a criteria, preferably 
by proving it mathematically. 

4) Preventing Intruder from Re-embedding Another Water- 
mark: As a passive technique,one of the main challenges of 
watermarking schemes is the authenticity of the watermark. 
Scheme designers need to find techniques to protect their de- 
signs from intruders who may try to embed another watermark 
in the design at least to destroy watermark authenticity in front 
of a court. 
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5) Embedding Enough Data to Identify Ownership: The 
watermarking scheme should add enough data to identify the 
owner of the design. This data should be concrete enough to be 
considered as an evidence in front of a court. Nevertheless, the 
data size should be small enough to neither impose a high over- 
head on the design size nor to affect the design performance. 
The amount of data embedded is one of the measures used to 
differentiate between different IP watermarking techniques 

6) Implementation Overhead: Watermarking a design is 
a complementary process to increase its competitiveness but 
affecting the design performance or having a high overhead 
in the insertion processwould be considered a real drawback. 
For IPwatermarking, we will consider the area, power and 
delay overheads compared to the original design without 
watermarking. 

7) Detection and Tracking: Watermark insertion is only 
half the process, tracking and detection is the second important 
aspect in any watermarking technique. Tracking and detecting 
the watermark or its traces after possible attacks is essential. 
This will be considered as one of the main aspects for judging 
watermarking techniques. 

8) Asymmetry: Since Diffe and Helman presented their 
public encryption scheme, public techniques have proven their 
strength especially in non-secure environments. Sharing IP 
designs poses the same threats as other secret data in the public 
domain. Third parties, such as brokers and sub-contractors, 
need to know the watermark key for tracking purposes. But 
these parties are not considered secure entities. Leakage and 
stealing IPs can still happen through in-houseworkers, who 
may knowthewatermarking key. Asymmetric watermarking is 
still considered a challenge in many media domains. Deleting 
watermarks and attacking it is mostly related to the knowledge 
of its presence and where it might be located. 


C. Attacks 


Digital watermarking attacks are categorized in four main 
classes : 


e Unauthorized removal 

e Unauthorized embedding 
e Unauthorized detection 
e System attacks 


The same categorization applies for IP watermarking 
schemes. System attacks aim at attacking the concept of 
watermarking itself, such as attacking the cryptographic base 
of the watermarking,or removing the chip that checks the 
watermark physically in case of video media for instance. 
This kind of attacks cannot to be avoided by the watermarking 
schemes. The VSI Alliance IP protection scheme solves this by 
protecting the design through different transactions. Sharing IP 
designs poses high security risks. IPs need time and effort to be 
designed and verified, but they can be easily stolen or forged. 
Digital watermarking, used with most of the digital shared 
media, is considered a solution for the copyright protection of 
IP blocks. It was introduced as a way to protect both the owner 
and the customer rights against forging or illegal distribution 
of the IP blocks. In this paper, we have first introduced a 


set of evaluation criteria then surveyed the current state-of- 
the-art in IP digital watermarking, and finally compared the 
different techniques available, discussing major advantages 
and disadvantages.IP watermarking schemes still need more 
development to be integrated in the design cycle. 

Future IP watermarking schemes should be robust enough to 
secure design, yet they should final watermarked product.We 
believe that the different techniques introduced should be use 
both hierarchically to protect the design at different levels, as 
well as modularly to protect different parts of the IP design. 
Adding a hierarchy of watermarks through the design cycle can 
give a more robust watermark against attacks. Starting form 
high levels of the design (i.e., system level) and integrating 
the watermark through many design levels insures robustness, 
which decreases the risks of destroying the watermark. These 
watermarks should be easily detectable at lower design levels 
to insure proper tracking. Efficient watermarking schemes 
should also use a public-key encryption algorithm in the 
watermarking process, thus allowing third party entities (such 
as brokers) to get into the distribution cycle without security 
hazards. 

Finally, IP watermarking developers are missing a strong 
benchmark like those available,e.g., for photos. Such bench- 
mark would be a balanced measure for the strength of different 
approaches. Benchmarking an IP watermarking scheme is 
harder than for instance photos as the watermark might be 
spread in many design levels, given the different nature of the 
design span of SOCs 
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Fig. 3. Video Watermarking 


IV. VIDEO WATERMARKING 


Watermark can be either directly inserted in the raw video 
data or integrated during encoding process or implemented 
after compressing the video data.Growing popularity of video 
based applications such as Internet multimedia, wireless 
video, personal video recorders, video-on-demand, set-top 
box, videophone and videoconferencing have a demand for 
much higher compression to meet bandwidth criteria and best 
video quality as possible. Different video Encoder Decoders 
(CODECs) have evolved to meet the current requirements of 
video application based products. Among various available 
standards H.264 / Advanced Video Codec (AVC) is becoming 
an important alternative regarding reduced band width, better 
image quality in terms of peak-signal-to-noise-ratio (PSNR) 
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and network friendliness , but it requires higher computational 
complexity. 

Different watermarking techniques have been proposed for 
different video CODECs, but only a few works on H.264/AVC 
can be found in the literature. H.264/AVC uses different trans- 
formation and block sizes than MPEG series, so development 
of new algorithms is required to integrate robust watermarking 
techniques for different profiles of H.264/AVC. 


A. Terminologies 


Video watermarking describes the process of embedding 
information in video data. Different. The important terminolo- 
gies pertaining to digital video watermarking are: 


e Digital Video : Video sequence is a collection of consec- 
utive and equally time spaced still images. 

e Payload : It is the amount of information that can be 
stored in a watermark. An important concept regarding 
the videowatermarking payload is watermark granularity. 
Watermark granularity can be defined as how much 
data is required for embedding one unit of watermark 
information. 

e Perceptibility : Video watermarking methodology is 
called imperceptible if humans cannot distinguish be- 
tween the original video from the video with inserted 
watermark 

e Robustness : A fragile watermark should not be robust 
against intentional modification techniques, as failure to 
detect the watermark signifies that the received data is no 
longer authentic. In case of application such as copyright 
protection, it is desirable that watermark always remains 
in the video data, even if the video data is subjected to 
intentional and unintentional signal processing attacks. 
Hence, depending on the requirements of the application 
the watermark is embedded in a robust, semi-fragile or 
fragile manner. 

e Security : The security of the watermarking algorithm is 
ensured in the same way as in encryption methodology. 
According to the Kerckhoffs assumption, the algorithm 
for watermark embedding can be considered to be public, 
where as the security depend solely on the choice of a 
key from a large key space. 


B. Techniques 


e Apparently any image watermarking technique can be 
extended to watermark videos, but in reality video wa- 
termarking techniques need to meet other challenges 
than that in image watermarking schemes such as large 
volume of inherently redundant data between frames, the 
unbalance between the motion and motionless regions, 
real-time requirements in the video broadcasting etc. Wa- 
termarked video sequences are very much susceptible to 
pirate attacks such as frame averaging, frame swapping, 
statistical analysis, digital-analog (AD/DA) conversion, 
and lossy compressions. 

e Video watermarking applications can be grouped as se- 
curity related like Copy control , fingerprinting, owner- 


ship identification, authentication, taper resistance etc. or 
value added applications like legacy system enhancement, 
database linking, video tagging, digital video broadcast 
monitoring , Media Bridge etc. 

e Apart from robustness, reliability, imperceptibility, Wa- 
termarking Domain Document Perception Application 
Spatial Frequency Source Based Destination Based Text 
Image Audio Video Invisible Visible Robust Fragile 
Private Public Invertible Non-invertible Quasi-invertible 
Nonquasi-invertible practicality, video watermarking al- 
gorithms should also address issues such as localized de- 
tection, real time algorithm complexity, synchronization 
recovery, effects of floating point representation, power 
dissipation etc. 

e Techniques are classified in pixel domain and transform 
domain techniques. In pixel domain the watermark is 
embedded in the source video by simple addition or 
bit replacement of selected pixel positions. The main 
advantages of using pixel domain techniques are that 
they are conceptually simple to understand and the time 
complexity of these techniques are low which favours 
real time implementations. But these techniques generally 
lacks in providing adequate robustness and imperceptibil- 
ity requirements. 

e In transform domain methods, the host signal is trans- 
formed into a different domain and watermark is embed- 
ded in selective coefficients. Commonly used transform 
methodologies are discrete cosine transformation (DCT) 
and discrete wavelet transformation (DWT). Detection 
is generally performed by transforming the received 
signal into appropriate domain and searching for the 
watermarking patterns or attributes. The main advantage 
of the transformed domain watermarking is the easy 
applicability of special transformed domain properties. 
For example, working in the frequency domain enables 
us to apply more advanced properties of the human 
visual system (HVS) to ensure better robustness and 
imperceptibility criteria. 


V. TECHNIQUES, APPLICATIONS AND ATTACKS 


As an emerging technology, digital watermarking involves 
the ideas and theories of different subject coverage, such 
as signal processing, cryptography, probability theory and 
stochastic theory, network technology, algorithm design, and 
other techniques . Digital watermarking hides the copyright 
information into the digital data through certain algorithm. 
The secret information to be embedded can be some text, 
authors serial number, company logo, images with some 
special importance. This secret information is embedded to the 
digital data (images, audio, and video) to ensure the security, 
data authentication, identification of owner and copyright pro- 
tection. The watermark can be hidden in the digital data either 
visibly or invisibly. For a strong watermark embedding, a good 
watermarking technique is needed to be applied. Watermark 
can be embedded either in spatial or frequency domain. Both 
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the domains are different and have their own pros and cons 
and are used in different scenario. 


A. Techniques 


Watermarking is the method to hide the secret information 
into the digital media using some strong and appropriate 
algorithm. Algorithm plays a vital role in watermarking as, if 
the used watermarking technique is efficient and strong then 
the watermark being embedded using that technique cannot 
be easily detected. The attacker can only destroy or detect the 
secret information if he know the algorithm otherwise it is 
critical to know the watermark. There are various algorithms 
present in the today scenario that are used to hide the infor- 
mation. Those algorithms come into two domains, Spatial and 
Frequency domain. 

1) Spatial Domain: Spatial domain digital watermarking 
algorithms directly load the raw data into the original im- 
age . Spatial watermarking can also be applied using color 
separation. In this way, the watermark appears in only one 
of the color bands. This renders the watermark visibly subtle 
such that it is difficult to detect under regular viewing. Spatial 
domain is manipulating or changing an image representing an 
object in space to enhance the image for a given application. 
Techniques are based on direct manipulation of pixels in an 
image . Some of its main algorithms are as discussed below: 


e Additive Watermarking: The most straightforward 
method for embedding the watermark in spatial domain 
is to add pseudo random noise pattern to the intensity 
of image pixels. The noise signal is usually integers 
like (-1, 0, 1) or sometimes floating point numbers. To 
ensure that the watermark can be detected, the noise is 
generated by a key, such that the correlation between the 
numbers of different keys will be very low. 

e Least Significant Bit: Old popular technique embeds the 
watermark in the LSB of pixels. This method is easy 
to implement and does not generate serious distortion 
to the image; however, it is not very robust against 
attacks. The embedding of the watermark is performed 
choosing a subset of image pixels and substituting the 
least significant bit of each of the chosen pixels with 
watermark bits. The watermark may be spread throughout 
the image or may be in the select locations of the image. 
But these primitive techniques are vulnerable to attacks 
and the watermark can be easily destroyed. Such an 
approach is very sensitive to noise and common signal 
processing and cannot be used in practical applications. 


2) Frequency Domain: Compared to spatial-domain meth- 
ods, frequency-domain methods are more widely applied. The 
aim is to embed the watermarks in the spectral coefficients 
of the image. The most commonly used transforms are the 
Discrete Cosine Transform (DCT), Discrete Fourier Transform 
(DFT), Discrete Wavelet Transform (DWT), the reason for wa- 
termarking in the frequency domain is that the characteristics 
of the human visual system (HVS) are better captured by the 
spectral coefficients. 


3) Transfer Domain: Transform domain techniques embed 
watermark with visually recognizable pattern in the images as 
a set of independent and identical distributed sequences. 


B. Applications 


1) Copyright Protection: Digital watermarking can be used 
to identify and protect copyright ownership. Digital content 
can be embedded with watermarks depicting metadata identi- 
fying the copyright owners. 

2) Digital Right Management: Digital right management 
(DRM) can be defined as the description, identification, trad- 
ing, protecting, monitoring, and tracking of all forms of 
usages over tangible and intangible assets. It concerns the 
management of digital rights and the enforcement of rights 
digitally. 

3) Copy Protection: Digital content can be watermarked to 
indicate that the digital content cannot be illegally replicated. 
Devices capable of replication can then detect such watermarks 
and prevent unauthorized replication of the content. 

4) Broadcast Monitoring: Wavelet Transform is a mod- 
ern technique frequently used in digital image processing, 
compression, watermarking etc. The transforms are based on 
small waves, called wavelet, of varying frequency and limited 
duration. 

5) Medical Application: Names of the patients can be 
printed on the X-ray reports and MRI scans using techniques 
of visible watermarking. The medical reports play a very 
important role in the treatment offered to the patient. If there 
is a mix up in the reports of two patients this could lead to a 
disaster. 

6) Image and Content Authentication: In an image au- 
thentication application the intent is to detect modifications 
to the data. The characteristics of the image, such as its 
edges, are embedded and compared with the current images 
for differences. A solution to this problem could be borrowed 
from cryptography, where digital signature has been studied 
as a message authentication method. One example of digital 
signature technology being used for image authentication is 
the trustworthy digital camera. 

7) Media Forensics: Forensic watermark applications en- 
hance a content owner’s ability to detect and respond to 
misuse of its assets. Forensic watermarking is used not only to 
gather evidence for criminal proceedings, but also to enforce 
contractual usage agreements between a content owner. 


C. Attacks 


There are various possible malicious intentional or uninten- 
tional attacks that a watermarked object is likely to subject to. 
The availability of wide range of image processing soft wares 
made it possible to perform attacks on the robustness of the 
watermarking systems. The aim of these attacks is prevent 
the watermark from performing its intended purpose. A brief 
introduction to various types of watermarking attacks is as 
under. 
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1) Removal Attack: Removal attacks intend to remove the 
watermark data from the watermarked object. Such attacks 
exploit the fact that the watermark is usually an additive noise 
signal present in the host signal. 

2) Interference Attack: Interference attacks are those which 
add additional noise to the watermarked object. Lossy com- 
pression, quantization, collusion, denoising, remodulation, av- 
eraging, and noise storm are some examples of this category 
of attacks. 

3) Geometric Attack: All manipulations that affect the 
geometry of the image such as flipping, rotation, cropping, 
etc. should be detectable. A cropping attack from the right- 
hand side and the bottom of the image is an example of this 
attack. 

4) Forgery Attack: The forgery attacks that result in object 
insertion and deletion, scene background changes are all 
tantamount to substitution. 

5) Security Attack: In particular, if the watermarking al- 
gorithm is known, an attacker can further try to perform 
modifications to render the watermark invalid or to estimate 
and modify the watermark. In this case, we talk about an attack 
on security. The watermarking algorithm is considered secure 
if the embedded information cannot be destroyed, detected or 
forged. 

6) Protocol Attack: The protocol attacks do neither aim 
at destroying the embedded information nor at disabling the 
detection of the embedded information (deactivation of the 
watermark). Rather than that, they take advantage of semantic 
deficits of the watermarks implementation. Consequently, a 
robust watermark must not be invertible or to be copied. A 
copy attack, for example, would aim at copying a watermark 
from one media into another without knowledge of the secret 
key. 

7) Cryptographic Attack: Cryptographic attacks deal with 
the cracking of the security. For example, finding the secret 
watermarking key using exhaustive brute force method is a 
cryptographic attack. Another example of this type of attack 
is the oracle attack. In the oracle attack, a non-watermarked 
object is created when a public watermark detector device 
is available. These attacks are similar to the attacks used in 
cryptography. 

8) Passive Attack: In this case, the attacker is not trying to 
remove the watermark but simply attempting to determine if 
a given mark is present or not. Cox et al (2002) suggest that, 
protection against passive attacks is of the utmost importance 
in covert communications where the simple knowledge of the 
presence of watermark is often more than one want to grant. 

9) Collusion Attack: In collusive attacks, the goal of the 
hacker is the same as for the active attacks but the method is 
slightly different. In order to remove the watermark, the hacker 
uses several copies of the same data, containing each different 
watermark, to construct a new copy without any watermark. 
This is a problem in fingerprinting applications (e.g. in the 
film industry) but is not the widely spread because the attacker 
must have access to multiple copies of the same data and that 
the number needed can be pretty important. 


10) Image Degradation: These type of attacks damage 
robust watermarks by removing parts of the image. The parts 
that are replaced may carry watermark information. Examples 
of these operations are partial cropping, row removal and 
column removal. Insertion of Gaussian noise also comes under 
this category, in which the image is degraded by adding noise 
controlled by its mean and its variance. 

11) Image Enhancement: These attacks are convolution 
operations that desynchronize the watermark information in an 
image. These attacks include histogram equalization, sharpen- 
ing, smoothing, median filtering and contrast enhancement. 

12) Image Compression: In order to reduce the storage 
space and cut the cost of bandwidth required for transmitting 
images, images are generally compressed with JPEG and 
JPEG2000 compression techniques. These lossy compression 
methods are more harmful as compared to lossless com- 
pression methods. Lossless compression methods can recover 
the watermark information with inverse operation. However 
lossy compression techniques produce irreversible changes to 
the images. Therefore probability of recovering watermarked 
information is always very low. 

13) Image Transformation: These types of attacks are 
also called synchronization attacks or geometrical attacks. 
The famous software Stir Mark uses small local geometrical 
distortions to invalidate watermark detection. Geometrical 
attacks include rotation, scaling and translation also called 
RST attacks. Some researchers focus on RST robustness 
while designing the robust watermarking systems, because 
it is fundamental problem. Besides RST transforms, image 
transformations also include other transforms such as aspect 
ratio change, shearing, reaction and projection. 


VI. CONCLUSION 


Several watermarking algorithms exists; in contrast tothe 
spatial-domain-based watermarking, frequency-domainbased 
techniques can embed more bits of watermark and have 
proved to be more robust to attacks. On-line application of 
watermarking for video in the spatial domain becomes cum- 
bersome due to associated high computational complexities 
involved. Similarly, watermarking in the DCT domain needs 
preprocessing operations such as inverse entropy coding and 
inverse quantization.Therefore, it appears clear that there is 
not a best watermarking technique, but the optimal scheme to 
be employed depends on the medium type, on the application 
requirements, on the robustness and computational complexity 
tradeoff, and on the on-the-fly or pre/post-processing oper- 
ations possibilities. Sharing IP designs poses high security 
risks. IPs need time and effort to be designed and verified, 
but they can be easily stolen or forged. Digital watermarking, 
used with most of the digital shared media, is considered 
a solution for the copyright protection of IP blocks. It was 
introduced as a way to protect both the owner and the customer 
rights against forging or illegal distribution of the IP blocks. 
In this paper, we have first introduced a set of evaluation 
criteria then surveyed the current state-of-the-art in IP digital 
watermarking,and finally compared the different techniques 


Farha Kabeer et al., “A Survey of Digital Watermarking Techniques and Applications” 70 


Proceedings of the Vidya Computer Applications Departmental Seminar (VCADS - 2018), 4 - 5 April 2018 
Department of Computer Applications, Vidya Academy of Science & Technology, Thrissur — 680501 


available, discussing major advantages and disadvantages.Also 
an image watermarking is embedding owner identification in 
host of the image. 
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Abstract—Electronic Commerce is the process of binding the 
modern technology and electronic devices with the commercial 
trade and services with the help of internet. Security is the first 
priority constraint to be managed when we bring in internet 
to the financial transaction and trade sector. As a new business 
model, E-Commerce has provided a more convenient transaction 
mode and lower transaction cost. However it has to overcome 
the obstacle of security. Which is the most important as well 
as a harder constraint to override. Security is often sited as 
a major barrier to further development of e-commerce on the 
open internet, such as clients information divulging, credit card 
embezzling, and so on. These problems warn people in E- 
Commerce and make them reluctant to trade and pay on internet. 
Security has become one of the bottlenecks that restrict the 
development of E-Commerce. 

Index Terms—Security Issues, Challenges and Solutions for 
E-Commerce Applications over Web 


I. INTRODUCTION 


The Internet is a public network consisting of thousands 
of private computer network connected together. A private 
computer network system is exposed to potential threats from 
anywhere on the public network. It may be difficult to trace 
the source of a cybercrime. Different methods are suitable for 
specific situations, but there is no overall suitable method to 
foil all internet frauds for every situation. So current internet 
security policy and technology fail to meet end user needs. 
The fast growth of the Internet and the World Wide Web, 
computer and information systems have increasingly become 
the targets of criminal attacks and intrusions. Network security 
is the sum of all measure taken to prevent data loss. Creating 
user accounts to hiring loyal employees and keeping the 
server locked in a room. Security can include encryption 
/ decryption of data, digital signature, secure socket layer, 
Biometric measures and firewall. Encryption plays a great 
role; it is the process of converting an original message that 
is known as plain. The unpredictable growth of the Internet 
users in world opened a new business opportunity to the whole 
world. It got a special attention when the IBM put forward the 
concept of E-Commerce in 1990s, as a new business model, E- 
Commerce has provided a more convenient transaction mode 
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and lower transaction cost. 


II. ADVANTAGES OF E-COMMERCE 


The main advantage of e-commerce is its ability to reach a 
global market, without necessarily implying a large financial 
investment. The limits of this type of commerce are not 
defined geographically, which allows consumers to make a 
global choice, obtain the necessary information and compare 
offers from all potential suppliers, regardless of their locations. 
By allowing direct interaction with the final consumer, e- 
commerce shortens the product distribution chain, sometimes 
even eliminating it completely. This way, a direct channel 
between the producer or service provider and the final user is 
created, enabling them to offer products and services that suit 
the individual preferences of the target market. E-commerce 
allows suppliers to be closer to their customers, resulting in 
increased productivity and competitiveness for companies; as 
a result, the consumer is benefited with an improvement in 
quality service, resulting in greater proximity, as well as a 
more efficient pre and post-sales support. With these new 
forms of electronic commerce, consumers now have virtual 
stores that are open 24 hours a day. Cost reduction is another 
very important advantage normally associated with electronic 
commerce. The more trivial a particular business process 
is, the greater the likelihood of its success, resulting in a 
significant reduction of transaction costs and, of course, of 
the prices charged to customers. 


III. DISADVANTAGES OF E-COMMERCE 


There is a lack of system security, reliability, standards, and 
some communication protocols. It is difficult to integrate the 
Internet and EC software with some existing applications and 
databases. Market culture is averse to electronic commerce 
(customers cannot touch or try the products); The users loss 
of privacy, the loss of regions and countries cultural and 
economic identity. 
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IV. CATEGORIES OF E-COMMERCE 


A. Business-to-Business (B2B) 


Business-to-Business (B2B) e-commerce encompasses all 
electronic transactions of goods or services conducted between 
companies. It includes the EDI transactions and electronic 
market transactions between organizations. Producers and tra- 
ditional commerce wholesalers typically operate with this type 
of electronic commerce. 


B. Business-to-Consumer (B2C) 


The Business-to-Consumer type of e-commerce is distin- 
guished by the establishment of electronic business relation- 
ships between businesses and final consumers. It corresponds 
to the retail section of e-commerce, where traditional retail 
trade normally operates. 

These types of relationships can be easier and more dy- 
namic, but also more sporadic or discontinued. This type of 
commerce has developed greatly, due to the advent of the 
web, and there are already many virtual stores and malls on 
the Internet, which sell all kinds of consumer goods, such 
as computers, software, books, shoes, cars, food, financial 
products, digital publications, etc. 

When compared to buying retail in traditional commerce, 
the consumer usually has more information available in terms 
of informative content and there is also a widespread idea 
that youll be buying cheaper, without jeopardizing an equally 
personalized customer service, as well as ensuring quick 
processing and delivery of your order. 


C. Consumer-to-Consumer (C2C) 


Consumer-to-Consumer (C2C) type e-commerce encom- 
passes all electronic transactions of goods or services con- 
ducted among consumers. Generally, these transactions are 
conducted through a third party, which provides the online 
platform where the transactions are actually carried out. 


D. Consumer-to-Business (C2B) 


In C2B there is a complete reversal of the traditional 
sense of exchanging goods. This type of e-commerce is very 
common in crowd sourcing based projects. A large number 
of individuals make their services or products available for 
purchase from companies seeking precisely these types of 
services or products. 


E. Business-to-Administration (B2A) 


This part of e-commerce encompasses all transactions con- 
ducted online between companies and public administration. 
This is an area that involves a large amount and a variety of 
services, particularly in areas such as fiscal, social security, 
employment, legal documents and registers, etc. These types 
of services have increased considerably in recent years with 
investments made in e-government. 


F. Consumer-to-Administration (C2A) 


The Consumer-to-Administration model encompasses all 
electronic transactions conducted between individuals and 
public administration. 

Both models involving Public Administration (B2A and 
C2A) are strongly associated with the idea of efficiency and 
easy usability of the services provided to citizens by the gov- 
ernment, with the support of information and communication 
technologies. 


V. MODES OF E-COMMERCE PAYMENT 
A. Credit card 


When a customer purchases a product via credit card, credit 
card issuer bank pays on behalf of the customer and customer 
has a certain time period after which he/she can pay the credit 
card bill. 


B. Debit Card 


The amount gets deducted from the card’s bank account 
immediately and there should be sufficient balance in the bank 
account for the transaction to get completed. 


C. Smart Card 


Smart card is again similar to a credit card or a debit card in 
appearance, but it has a small microprocessor chip embedded 
in it. It has the capacity to store a customers work-related 
and/or personal information. Smart cards are also used to store 
money and the amount gets deducted after every transaction. 


D. E-Money 


E-Money transactions refer to situations where payment 
is done over the network and the amount gets transferred 
from one financial body to another financial body without any 
involvement of a middleman. 


E. Electronic Fund Transfer 


It is a very popular electronic payment method to transfer 
money from one bank account to another bank account. 
Accounts can be in the same bank or different banks. Fund 
transfer can be done using ATM (Automated Teller Machine) 
or using a computer. 


VI. SECURITY ISSUES 
A. Data Confidentiality 


The information stored on a system is protected against 
unintended or unauthorized access. The Confidentiality is 
roughly equivalent to privacy. Measures undertaken to ensure 
confidentiality are designed to prevent sensitive information 
from reaching the wrong people, while making sure that the 
right people can in fact get it: Access must be restricted to 
those authorized to view the data in question. It is common, 
as well, for data to be categorized according to the amount 
and type of damage that could be done should it fall into 
unintended hands. More or less stringent measures can then 
be implemented according to those categories. 
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Data encryption is a common method of ensuring confiden- 
tiality. User IDs and passwords constitute a standard proce- 
dure; two-factor authentication is becoming the norm. Other 
options include biometric verification and security tokens, key 
fobs or soft tokens. In addition, users can take precautions to 
minimize the number of places where the information appears 
and the number of times it is actually transmitted to complete a 
required transaction. Extra measures might be taken in the case 
of extremely sensitive documents, precautions such as storing 
only on air gapped computers, disconnected storage devices 
or, for highly sensitive information, in hard copy form only. 


B. Data Integrity 


It services assure or give assurance that the data received 
are exactly as sent by an authorized entity. 

Integrity involves maintaining the consistency, accuracy, 
and trustworthiness of data over its entire life cycle. Data 
must not be changed in transit, and steps must be taken to 
ensure that data cannot be altered by unauthorized people 
(for example, in a breach of confidentiality). These measures 
include file permissions and user access controls. Version 
control maybe used to prevent erroneous changes or accidental 
deletion by authorized users becoming a problem. In addition, 
some means must be in place to detect any changes in data 
that might occur as a result of non-human-caused events such 
as an electromagnetic pulse (EMP) or server crash. Some 
data might include checksums, even cryptographic checksums, 
for verification of integrity. Backups or redundancies must be 
available to restore the affected data to its correct state. 


C. Availability 


Availability is best ensured by rigorously maintaining all 
hardware, performing hardware repairs immediately when 
needed and maintaining a correctly functioning operating 
system environment that is free of software conflicts. Its 
also important to keep current with all necessary system 
upgrades. Providing adequate communication bandwidth and 
preventing the occurrence of bottlenecks are equally important. 
Redundancy, failover, RAID even high-availability clusters can 
mitigate serious consequences when hardware issues do occur. 
Fast and adaptive disaster recovery is essential for the worst 
case scenarios; that capacity is reliant on the existence of 
a comprehensive disaster recovery plan (DRP). Safeguards 
against data loss or interruptions in connections must include 
unpredictable events such as natural disasters and fire. To 
prevent data loss from such occurrences, a backup copy may 
be stored in a geographically-isolated location, perhaps even 
in a fireproof, waterproof safe. Extra security equipment or 
software such as firewalls and proxy servers can guard against 
downtime and unreachable data due to malicious actions such 
as denial-of-service (DoS) attacks and network intrusions. 


D. Access Control 


It is the prevention of unauthorized use of resources. E- 
commerce must establish mutual trust and secure access be- 
tween the parties in an e-commerce transaction by authenticat- 
ing users, authorizing access, and enforcing security features. 


E. Non-repudiation 


Provides protection against denial by one of the entities 
involved in communication having participated in all part of 
the communication. That is the assurance that someone cannot 
deny something. Typically, non repudiation refers to the ability 
to ensure that a party to a contract or a communication cannot 
deny the authenticity of their signature on a document or the 
sending of a message that they originated. 


F. Viruses 


a computer virus is a type of malicious code or program 
written to alter the way a computer operates and that is 
designed to spread from one computer to another. A virus 
operates by inserting or attaching itself to a legitimate program 
or document that supports macros in order to execute its code. 
In the process a virus has the potential to cause unexpected 
or damaging effects, such as harming the system software by 
corrupting or destroying data. 

Once a virus has successfully attached to a program, file, 
or document, the virus will lie dormant until circumstances 
cause the computer or device to execute its code. In order for 
a virus to infect your computer, you have to run the infected 
program, which in turn causes the virus code to be executed. 
This means that a virus can remain dormant on your computer, 
without showing major sings or symptoms. However, once 
the virus infects your computer, the virus can infect other 
computers on the same network. Stealing passwords or data, 
logging keystrokes, corrupting files, spamming your email 
contacts, and even taking over your machine are just some 
of the devastating and irritating things a virus can do. 

Viruses can be spread through email and text message 
attachments, Internet file downloads, social media scam links, 
and even your mobile devices and smartphones can become 
infected with mobile viruses through shady App downloads. 
Viruses can hide disguised as attachments of socially shareable 
content such as funny images, greeting cards, or audio and 
video files. 


G. Worm 


A computer worm is a type of malicious software program 
whose primary function is to infect other computers while 
remaining active on infected systems. 

Worms replicate itself through the internet. It just takes 
a matter of worm cause harm to the millions of computers 
globally. It causes harm to the resources of the computer. 


H. Denial-of-Service Attack (DoS) 


Denial-Of-Service (DoS) is an attack targeted at depriving 
legitimate users from online services. It is done by flooding 
the network or server with useless and invalid authentication 
requests which eventually brings the whole network down, 
resulting in no connectivity. As a result of this, users are 
prevented from using a service. Denial of service attack is 
basically Transmission attack, Denial of data flow between 
users. 
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I. Phishing Attack 


Phishing Attack is the criminally fraudulent process of 
attempting to acquire sensitive information such as usernames, 
passwords and credit card details, by masquerading as a 
trustworthy entity in an electronic communication. 


VII. SECURE E-COMMERCE GUIDELINE 


Secure website uses encryption technology to transfer infor- 
mation from your computer to the online merchants computer. 
Research the website before you order, If the company is 
unfamiliar, we have to do homework before buying their 
products. Read the shopping website privacy and security 
policy. Which mode will be safe: credit card, debit card, 
cash or checks. Checking the address bar which contains the 
Uniform resource locator we can judge whether we are dealing 
with correct company or not. Learn the merchants cancellation, 
return and complaint-handling policies. 

URL address should start with https:// instead for http://. 
By symbol (lock icon) or by a message. A browser that 
warns the user when tends to enter an unsecured site. The 
security features provided by ISP, web designers and software 
companies should be reviewed. Pay attention towards security 
alerts and should install security patches. Scan for spyware and 
viruses periodically and keep updated. Keep backup system 
and information. Obtain a digital certificate for websites. 
Ensure transactional/Account information is not stored in the 
system. Develop privacy policies. Verify customer addresses. 
Keep fraud filters to screen suspicious activities. 


VIII. POLICY OF E-COMMERCE SECURITY 


A privacy policy is a statement or a legal document that dis- 
closes some or all of the way a party gathers, uses, discloses, 
and manages a customer or client’s data. Secret key cryptog- 
raphy, transposition and substitution. A transposition cipher 
encrypts the original message by changing the characters order 
in which they occurred. Where as in substitution cipher, the 
original message was encrypted by replacing their characters 
with other characters. Public-key cryptography was developed 
to solve the secret-key distribution problem associated with 
secret key method, Public key method use two different but 
mathematically related, keys. One of the keys is used to 
encrypt the data, i.e. Plaintext and the second key are used 
to decrypt the cipher text. Digital certificates are electronic 
credentials that are used to certify the identities of individuals, 
computers, and other entities on a network. 


A. PKI-based security model 


On the E-Commerce business model, there are customers, 
merchants, PKI, banks etc. All the above sides rely on the 
PKI as their security mechanism and communicate through 
the Internet. A secured handshake must be established between 
the E-commerce Server and Client. 


B. Pretty Good Privacy 


PGP provides a confidentiality and authentication service 
that can be used for electronic mail and file storage applica- 
tions. PGP consists of five services: authentication, confiden- 
tiality, compression, e-mail compatibility and segmentation. 

1) Authentication: Authentication requires a digital signa- 
ture. This hash code is then encrypted with the senders private 
key and attached to the message. When the message has been 
received, the hash code attached to the message is compared 
to another hash code or summary calculated by the recipient. 

2) Confidentiality: Confidentiality is provided by encrypt- 
ing messages to be transmitted, the sender generates a message 
and a random 128bit number to be used as a session key 
for this message only. The message is encrypted, with the 
session key. The session key is encrypted with RSA, using 
the recipients public key, and is prepended to the message. 
The receiver uses RSA with its private key to decrypt and 
recover the session key. - The session key is used to decrypt 
the message. 

3) Compression: PGP can also compress the message if 
desired. The compression algorithm is ZIP and the decom- 
pression algorithm is UNZIP. 

4) E-Mail Compatibility: Many electronic mail systems can 
only transmit blocks of ASCII text. This can cause a problem 
when sending encrypted data since ciphertext blocks might not 
correspond to ASCII characters which can be transmitted.PGP 
overcomes this problem by using radix-64 conversion. 

5) Segmentation: Another constraint of e-mail is that there 
is usually a maximum message length.PGP automatically 
blocks an encrypted message into segments of an appropriate 
length.On receipt, the segments must be re-assembled before 
the decryption process. 


C. Secure Communication Protocol 


The safety of payment system is the key element of EC. A 
key technology to ensure the safety of the system is through 
Security Socket Layer (SSL) and secure electronic transaction. 

1) Security Socket Layer: Security Socket Layer, SSL, is 
the protocol that encodes the whole session among computers 
and provides the safe communication service on the Internet. 
Two coding are use Public coding key is used in the process 
of connecting and Special coding key is used in process of the 
session. 

2) Secure Electronic Transaction (SET) : Secure Electronic 
Transaction (SET) was a communications protocol standard 
for securing credit card transactions over insecure networks. 

Ensure the confidentiality of information and avoid being 
wiretapped when information is transmitted online. Only the 
authorized legal person can get and decode the information 

Ensure the entity of payment information, secure the data 
transmitted can be received fully without any alteration in the 
middle way. 

Three stages are included in the SET trade: 


e In the inquiry stage, customer and supplier confirm the 
detailed information on the payment method. 
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e In the confirming stage of payment, the suppliers will 
confirm with the bank, they will get the payment as the 
trading proceeds. 

e In the money-accepting stage, the suppliers will bring 
forth all the detailed information concerning all the 
relevant trading to the bank. 

Every stage is involved in the data coding technology and 
digital signature by RSA. 


D. Wireless Application Protocol 


The WAP (Wireless Application Protocol) advocates argue 
that the Wireless Transport Security Layer (WTLS) provides 
a secure infrastructure for m-commerce applications. Critics 
have decried the infamous ” WAP gap” where wireless requests 
for Web pages are translated at the WAP gateway from the 
WTLS protocol to the standard SSL protocol widely used in 
secure HTTP requests. 

WAP (Wireless Application Protocol) is a specification for 
a set of communication protocols to standardize the way 
that wireless devices, such as cellular telephones and radio 
transceivers, can be used for Internet access, including e-mail, 
the World Wide Web, newsgroups, and instant messaging. 
While Internet access has been possible in the past, different 
manufacturers have used different technologies. In the future, 
devices and service systems that use WAP will be able to 
interoperate. 

The WAP layers are: 

e Wireless Application Environment (WAE) 

e Wireless Session Layer (WSL) 

e Wireless Transport Layer Security (WTLS) 

e Wireless Transport Layer (WTP) 


IX. CONCLUSION 


E-commerce is spread world wide where internet has reach- 
ability. We all utilize the facilities of e-commerce every single 
day directly or indirectly with or without noticing the exact 
path how those services are being carried out. E-commerce 
consists of services more than products. Even the purchases 
of goods include a series of services behind the scene from 
the advertising, ordering processes till the delivery of the 
product to the customer. ATM and Net Banking that we use 
in our day to day life is part is an efficient result of the 
e-commerce technology. Security issues in e-commerce are 
to be given much enough important since it includes the 
financial as well as personal information records and data 
which are valuable and may result in heavy malpractices and 
misuses if leaked or accessed by hackers or cyber criminals. 


Precautions, safety and security measures are given enough 
importance and improved technical support parallel to the 
growing technologies, however the methods for malpractices 
and intrusions keep on finding new methods and ways day by 
day. It gives a massive headache to the service and security 
providers, they keep on finding solutions for the upcoming 
challenges as they rise up. Security of e-commerce could never 
achieve cent percent assurance, but still threats cannot access 


user data without user interference. That is, the users of e- 
commerce can resist the threats if they are a bit more careful 
when they deal with e-commerce sites and services. 
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Abstract—Attribute-Based Encryption (ABE) is one effective 
and promising technique that is used to provide fine grained 
access control to data in the Cloud. Initially, access to data in 
the Cloud was provided through Access Control Lists (ACLs). 
Attribute-Based Encryption is an access control mechanism 
where a user or a piece of data has attributes associated with it. 
An access control policy is defined and if the attributes satisfy the 
access control policy the user should be able to get access to the 
piece of data. There are two kinds of ABE: In key policy ABE 
(KP-ABE) cipher texts are labelled with sets of attributes and 
private keys are associated with access policies.The access control 
policy is stored with the users private key and the encrypted 
data additionally stores a number of attributes associated with 
the data. A user can decrypt a cipher text if the attributes 
associated with the cipher text satisfy the access policy associated 
with the private decryption key. In cipher text-policy ABE (CP- 
ABE) it is essentially the converse of KP-ABE. An access policy 
is associated with each cipher text. The private decryption key 
can be reconstructed correctly if a users attributes satisfy the 
access policy. 

Index Terms—Attribute, Access control policy, Key policy, 
Ciphertext policy, Encryption, Decryption. 


I. INTRODUCTION 
A. Cloud Computing 


Cloud computing is an information technology (IT) 
paradigm that enables ubiquitous access to shared pools of 
configurable system resources and higher-level services that 
can be rapidly provisioned with minimal management effort, 
often over the Internet. Cloud computing relies on sharing 
of resources to achieve coherence and economies of scale, 
similar to a public utility. Data outsourcing to third party 
cloud storage providers presents a number of issues. In order 
to provide virtually unlimited storage resources to end users, 
a cloud storage service usually spans multiple domains. Thus, 
data from different logical domains may be hosted at the same 
physical or virtual server, or the data may be segmented and 
stored on multiple servers across different security domains. 
Since virtualization hides the details of physical resources, 
the location of stored data becomes uncertain to users, which 
has a potential to result in mistrust of cloud storage service 
providers. The goal of cloud computing is to allow users 
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Fig. 1. Cloud computing 


to take benefit from all of these technologies, without the 
need for deep knowledge about or expertise with each one 
of them. The cloud aims to cut costs, and helps the users 
focus on their core business instead of being impeded by IT 
obstacles. The main enabling technology for cloud computing 
is virtualization. Virtualization software separates a physical 
computing device into one or more ”virtual” devices, each of 
which can be easily used and managed to perform computing 
tasks. With operating system-level virtualization essentially 
creating a scalable system of multiple independent computing 
devices, idle computing resources can be allocated and used 
more efficiently. Virtualization provides the agility required 
to speed up IT operations, and reduces cost by increasing 
infrastructure utilization. Autonomic computing automates the 
process through which the user can provision resources on- 
demand. By minimizing user involvement, automation speeds 
up the process, reduces labor costs and reduces the possibility 
of human errors. 


B. Different Types of Cloud Models 


Cloud model is composed of service models based on a 
service that the cloud is offering, and deployment models 
based on a cloud location. [6] 

1) Deployment Models: 
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i) 


ii) 


iii) 


Private cloud: The cloud infrastructure is provisioned 
for exclusive use by a single organization comprising 
multiple consumers (e.g., business units). It may be 
owned, managed, and operated by the organization, a 
third party, or some combination of them, and it may 
exist on or off premises. 

Public cloud: The cloud infrastructure is provisioned 
for open use by the general public. It may be owned, 
managed, and operated by a business, academic, or 
government organization, or some combination of 
them. It exists on the premises of the cloud provider. 
Community cloud: The cloud infrastructure is pro- 
visioned for exclusive use by a specific community 
of consumers from organizations that have shared 
concerns (e.g., mission, security requirements, policy, 
and compliance considerations). It may be owned, 
managed, and operated by one or more of the orga- 
nizations in the community, a third party, or some 
combination of them, and it may exist on or off 
premises. 

Hybrid cloud: The cloud infrastructure is a compo- 
sition of two or more distinct cloud infrastructures 
(private, community, or public) that remain unique 
entities, but are bound together by standardized or 
proprietary technology that enables data and applica- 
tion portability (e.g., cloud bursting for load balancing 
between clouds). 


Private Public 
Clouds Clouds 
Fig. 2. Hybrid cloud 1 
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2) Service Models: 


i) 


Infrastructure as a service (IaaS): Infrastructure as 
a service (IaaS) refers to online services that provide 
highlevel APIs used to dereference various low-level 
details of underlying network infrastructure like phys- 
ical computing resources, location, data partitioning, 
scaling, security, backup etc. 


ii) 


iii) 


iv) 


Platform as a service (PaaS): The capability provided 
to the consumer is to deploy onto the cloud infrastruc- 
ture consumer-created or acquired applications created 
using programming languages, libraries, services, and 
tools supported by the provider. The consumer does 
not manage or control the underlying cloud infrastruc- 
ture including network, servers, operating systems, or 
storage, but has control over the deployed applications 
and possibly configuration settings for the application 
hosting environment. 

Software as a service (SaaS): The capability provided 
to the consumer is to use the provider’s applications 
running on a cloud infrastructure. The applications 
are accessible from various client devices through 
either a thin client interface, such as a web browser 
(e.g., web-based email), or a program interface. The 
consumer does not manage or control the underlying 
cloud infrastructure including network, servers, oper- 
ating systems, storage, or even individual application 
capabilities, with the possible exception of limited 
user-specific application configuration settings. 
Mobile “backend” as a service (MBaaS): In the 
mobile ”backend” as a service (m) model, also known 
as back end as a service (BaaS), web app and mobile 
app developers are provided with a way to link their 
applications to cloud storage and cloud computing ser- 
vices with application programming interfaces (APIs) 
exposed to their applications and custom software 
development kits (SDKs). 


Cloud Clients 


Web browser, mobile app, thin client, terminal 
emulator, ... 


SaaS 


CRM, Email, virtual desktop, communication, 
games, ... 


PaaS 


Execution runtime, database, web server, 
development tools, ... 


Application 


Platform 
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Virtual machines, servers, storage, load 
balancers, network, ... 
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Fig. 4. Service models 


II. SERVICES OFFERED BY CLOUD SYSTEMS 


With regard to services, at the present time, the concept of 
cloud computing involves the provision of the following types 
of services to its users [7]: 


i) 


ii) 


Everything as a Service: This type of service is 
provided to all users of the software and hardware to 
control the business processes, including the interac- 
tion between users, the user only needs to have access 
to the Internet. 

Infrastructure as a Service: The computing infras- 
tructure is given to the user, typically virtual platforms 
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(PCs) connected to the network. It adjusts itself to suit 
the purpose. Platform as a Service: The computing 
platform is given to the user, with the operating system 
and required software. 

iii) Software as a Service: This type of service is usually 
positioned as software on demand, this software is 
deployed on remote servers and the user can access it 
via the Internet, and all updates and licenses for this 
software is governed by the service provider. Payment 
in this case is made for actual use of the software. 

iv) Hardware as a Service: In this case, the user of 
the service leases the hardware for his own purposes. 
This option allows you to save on maintenance of 
the equipment, but in essence little different from 
Infrastructure as a Service except that e have the 
bare hardware on which we can deploy your own 
infrastructure using the most appropriate software. 

v) Workplace as a Service: In this case, the company 
is using cloud computing for the organization of em- 
ployment of its employees by setting up and installing 
the necessary software required to operate personnel. 

vi) Data as a Service: The main idea of this type of 

service lies in the fact that the user is provided with 

storage space, which may be used to store large 
amounts of information. 

Security as a Service: This type of service enables 

users to quickly deploy, allowing products to ensure 

the safe use of Web technologies security of electronic 
communications, as well as the safety of the local 
system, which allows users of the service to save on 
deploying and maintaining their own security system. 


vii) 


II. SECURE DATA SHARING 


Cloud systems can be used to enable data sharing capa- 
bilities and this can provide an abundant of benefits to the 
user. Data sharing is becoming increasingly important for 
many users and sometimes a crucial requirement, especially 
for businesses and organizations aiming to gain profit.People 
love to share information with one another. Whether it is with 
friends, family, colleagues or the world, many people benefit 
greatly through sharing data [1]. Some of the benefits include: 


e Higher productivity 
e More enjoyment 
e To voice opinions 


Data sharing is becoming increasingly prevalent in many 
industries and organizations. Financial institutions also benefit 
from data sharing and benefits include better customer support 
and better understanding of the needs of the customer. Shared 
data can be used to improve modeling, analysis and risk tools. 

With the advancements in Cloud computing, there is now a 
growing focus on implementing data sharing capabilities in the 
Cloud. With social users, the ability to share files, including 
documents, photos and videos with other users provides great 
benefit to them. However, the main problem with data sharing 
in the Cloud is the privacy and security issues. 


A. Ideal Requirements of Data Sharing in the Cloud 


To enable data sharing in the Cloud, it is imperative that 
only authorized users are able to get access to data stored in 
the Cloud. The ideal requirements of data sharing in the Cloud 
are: 


e The data owner should be able to specify a group of users 
that are allowed to view his/her data. 

e Any member of the group should gain access to the data 
anytime without the data owners intervention. 

e No other user, other than the data owner and the members 
of the group, should gain access to the data, including the 
Cloud Service Provider 

e The data owner should be able to revoke access to data 
for any member of the group. 

e The data owner should be able to add members to the 
group. 

e No member of the group should be allowed to revoke 
rights of other members of the group or join new users 
to the group. 

e The data owner should be able to specify who has 
read/write permissions on the data owners files. 


B. Privacy and Security Requirement of Data Sharing in the 
Cloud 


Privacy is that the provider must ensure that all critical 
data are encrypted and that only authorized users have access 
to data in its entirety. The credentials and digital identities 
must be secure as any data that the provider collects about 
customer activity in the cloud. Security is that the provider 
must ensure that their data outsourced to the cloud is secure 
and the provider has to take security measures to protect their 
information in cloud. 


e Data Confidentiality: Unauthorized users (including the 
Cloud), should not be able to access data at any given 
time. Data should remain confidential in transit, at rest 
and on backup media. Only authorized users should be 
able to gain access to data. 

e User Revocation: When a user is revoked access rights 
to data, that user should not be able to gain access to the 
data at any given time. Ideally, user revocation should not 
affect other authorized users in the group for efficiency 
purposes. 

e Scalable and Efficient: Since the number of Cloud users 
tends to be extremely large and at times unpredictable 
as users join and leave, it is imperative that the system 
maintain efficiency as well as be scalability. 

e Collusion between entities: When considering data shar- 
ing methodologies in the Cloud, it is vital that even 
when certain entities collude, they should still not be 
able to access any of the data without the data owners 
permission. 


C. Types of Secure Data Sharing 


Traditional approach includes: 
e Key Management Recent approaches include: 
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e Attribute-Based Encryption 

e Proxy Re-encryption 

e Hybrid ABE and PRE 

1) Key Management: Key management is anything you 
do with a key except encryption and decryption and covers 
the creation/deletion of keys, activation/deactivation of keys, 
transportation of keys, storage of keys and so on. Most Cloud 
service providers provide basic key encryption schemes for 
protecting data or may leave it to the user to encrypt their 
own data. A Cloud Key Management Infrastructure (CKMI) 
is proposed which contains a Cloud Key Management Client 
(CKMC) and Cloud Key Management Server (CKMS). The 
protocol includes objects which contain keys and certificates, 
etc, the operations upon them such as creation, deletion, 
retrieval and updating of keys, certificates, and also attributes 
related to the object in question such as the object identifier. 
The method is effective for proper key management however, 
if the server is broken, all the users data is lost and there is 
no proper backup and recovery mechanism. 

2) Attribute-Based Encryption: Attribute-Based Encryption 
(ABE) is one effective and promising technique that is used 
to provide fine grained access control to data in the Cloud. 
Initially, access to data in the Cloud was provided through 
Access Control Lists (ACLs) however, this was not scalable 
and only provided coarse-grained access to data. Attribute 
Based encryption is first proposed by Goyal provides a more 
scalable and fine-grained access control to data in comparison 
to ACLs. Attribute-Based Encryption is an access control 
mechanism where a user or a piece of data has attributes 
associated with it. An access control policy is defined and if 
the attributes satisfy the access control policy the user should 
be able to get access to the piece of data. There are two kinds 
of ABE: 

e Key-Policy ABE (KP-ABE) 

e Cipher text-Policy ABE (CP-ABE) 

3) Proxy Re-encryption: Proxy Re-encryption is another 
technique that is fast becoming adopted for enabling secure 
and confidential data sharing and collaboration in the Cloud. 
Proxy Re-encryption allows a semi-trusted proxy with a re- 
encryption key to translate a cipher text under the data owners 
public key into another cipher text that can be decrypted by 
another users secret key. At no stage will the proxy be able 
to access the plaintext. Researchers have utilized proxy re- 
encryption in relation to the Cloud and in particular for secure 
and confidential data sharing and collaboration in the Cloud. 

4) Hybrid ABE and PRE: ABE and Proxy Re-encryption 
have also been used in combination with each other to provide 
extra security and privacy for data sharing and collaboration 
in the Cloud. A number of works in literature are taking 
advantage of combining the power of the two schemes to 
provide a more robust and guarantee further trust in the data 
owner for the secure sharing of data in the Cloud. 


IV. ATTRIBUTE BASED ENCRYPTION 


Attribute-based encryption (ABE) has been developed as 
a cryptographic primitive for the provision of fine-grained 


access control to encrypted data. In ABE, a set of system 
attributes are used to define user access rights or data access 
policies. ABE thus appears to be a promising tool for the 
protection of data in cloud storage environments. Attribute- 
based encryption is a type of public-key encryption in which 
the secret key of a user and the cipher text are dependent upon 
attributes (e.g. the country in which he lives, or the kind of 
subscription he has). In such a system, the decryption of a 
cipher text is possible only if the set of attributes of the user 
key matches the attributes of the cipher text. ABE Attributed 
based encryption (ABE), first introduced by Sahai and Waters 
, provides a mechanism by which we can ensure that even if 
the storage is compromised, the loss of information will only 
be minimal. What attribute based encryption does is that, it 
effectively binds the access-control policy to the data and the 
users (clients) instead of having a server mediating access to 
files. 

Access Policy: An access control policy would be a policy 
that defines the kind of users who would have permissions 
to read the documents. E.g., in an academic setting, grade- 
sheets of a class may be accessible only to a professor handling 
the course and some teaching assistants (TAs) of that course. 
We can express such a policy in terms of a predicate: ( 
(ProfessorCS dept.) W (M.tech studentcourse TACS dept.) ) 

We will call the various credentials (or variables) of the 
predicate as attributes and the predicate itself which represents 
the access policy as the access-structure. In the example here 
the access structure is quite simple. But in reality, access 
policies may be quite complex and may involve a large number 
of attributes. There are two major classes of ABE schemes. 

i) In key policy ABE (KP-ABE) cipher texts are labeled 
with sets of attributes and private keys are associated 
with access policies. The access control policy is 
stored with the users private key and the encrypted 
data additionally stores a number of attributes associ- 
ated with the data. A user can decrypt a cipher text if 
the attributes associated with the cipher text satisfy the 
access policy associated with the private decryption 
key. The access control policy is usually defined as an 
access tree with interior nodes representing threshold 
gates and leaf nodes representing attributes. 

ii) In cipher text-policy ABE (CP-ABE) it is essentially 
the converse of KP-ABE. An access policy is asso- 
ciated with each cipher text. The private decryption 
key can be reconstructed correctly if a users attributes 
satisfy the access policy. The access control policy is 
stored with the data and the attributes are stored in the 
users key. 

In ABE system, users private keys and cipher text are 
labeled with sets of descriptive attributes and access policies 
respectively, and a particular key can decrypt a particular 
cipher text only if a associated attributes and policy are 
matched.ABE provides a secure way that allows data owner 
to share outsourced data on a trusted storage server with a 
specified group of users. 

Also, important properties that attribute based encryption 
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schemes must satisfy is that of collusion resistance. Collusion 
resistance means that, if 2 or more users possessing different 
keys combine to decrypt the cipher text, they will be successful 
if and only if any one of the users could have decrypted it 
individually. In other words, even if multiple parties collude, 
they should not be able to decrypt the cipher text unless one 
of them was able to decrypt it completely by herself. These 
properties ensure that only users possessing the right keys have 
access to the information. Moreover, as the encryption is based 
on the access-structure it implicitly assures anonymous access 
control. 

The efficiency drawback of ABE is that the computational 
cost during decryption phase grows with the complexity of 
the access formula. Also, the attribute authority has to issue 
private keys to all users, but yet private key generation 
typically requires large exponentiation. When a large number 
of users call for their private keys, it may overload the attribute 
authority.[5] 


A. Key-Policy Attribute-Based Encryption (KP-ABE) 


The access control policy is stored with the users private 
key and the encrypted data additionally stores a number of 
attributes associated with the data. A user can only decrypt the 
data if the attributes of the data satisfy the access control policy 
in the users key. The access control policy is usually defined as 
an access tree with interior nodes representing threshold gates 
and leaf nodes representing attributes. In a key-policy attribute- 
based encryption (KP-ABE) system, cipher texts are labeled 
by the sender with a set of descriptive attributes, while users 
private key is issued by the trusted attribute authority captures 
an policy (also called the access structure)that specifies which 
type of cipher texts the key can decrypt.KP-ABE schemes 
are suitable for structured organizations with rules about who 
may read particular documents. Typical applications of KP- 
ABE include secure forensic analysis and target broadcast . 
For example, in a secure forensic analysis system, audit log 
entries could be annotated with attributes such as the name 
of the user, the date and time of the user action, and the 
type of data modified or accessed by the user action. While 
a forensic analyst charged with some investigation would be 
issued a private key that associated with a particular access 
structure. The private key would only open audit log records 
whose attributes satisfied the access policy associated with the 
private key. 

1) Problem Formulation: The system is 

The system has three participating entities: the cloud server, 
the users and a trusted third party. The trusted third party 
generates public key pk and private key sk for users. The 
users encrypt and send their private data to the cloud server. If 
a user wants the cloud server to test the cipher text, then the 
cloud serveries authorized and gains a trapdoor tr. However, 
the cloud server can only test whether the two cipher texts 
contain the same information and cannot decrypt them using 
the trapdoor. The legitimate users access data according to 
their attributes and can decrypt their cipher texts or test the 
cipher texts. If the legitimate users satisfy the access structure 
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Fig. 5. Illustration of the system model of KP-ABE 


for the test, they can get the test results of the cipher texts 
from the cloud server. If the legitimate users satisfy the access 
structure for the decryption, they can decrypt the cipher texts. 
An integrated KP-ABE scheme consists of four algorithms: 
KP-ABE scheme consists of the following four algorithms: 


i) Setup: This algorithm takes as input a security pa- 
rameter and returns the public key PK and a system 
master secret key MK. PK is used by message senders 
for encryption. MK is used to generate user secret keys 
and is known only to the Authority. 

ii) Encryption: This algorithm takes a message M, the 
Public key PK, and a set of attributes as input. It 
outputs the cipher text E. 

iii) Key Generation: This algorithm takes as input an 
access structure T and the master secret key MK. It 
outputs a secret key SK that permits the user to decrypt 
a message encrypted under a set of attributes if and 
only if equals T. 

iv) Decryption: It takes as input the users secret key SK 
for Access structure T and the cipher text E, which 
was encrypted under the attribute set. This algorithm 
outputs the message M if and only if the attribute set 
satisfies the users access structure T. 
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Fig. 6. KP-ABE Scheme 


As shown in the above diagram, in KP-ABE, Bob encrypts 
a message using a set of attributes. It defines an access 
structure, which is a threshold tree of the policy that bob wants 
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to enforce. Alice and Tim try to decrypt the message. The 
attributes Alice has satisfy the access structure and hence she 
can derive the key and decrypt the document. The attributes 
Tim has do not satisfy the access structure and therefore cannot 
derive the key to decrypt the message. The key idea here is that 
the key is associated with the policy using an access structure. 

Encryptor cannot decide who can decrypt the encrypted 
data. It can only choose descriptive attributes for the data, 
and has no choice but to trust the key issuer. KPABE is 
not naturally suitable to certain applications. For example, 
sophisticated broadcast encryption, where users are described 
by various attributes and in this, the one whose attributes 
match a policy associated with a cipher text, it can decrypt 
the cipher text. KP-ABE scheme supports user secret key 
accountability. It is providing fine grained access but has no 
longer with flexibility and scalability. 


B. Cipher text-Policy (ABE) 


In many situations, when user encrypts data, it is imperative 
that they establish a specific access control policy on who 
can decrypt this data. In CP-ABE a user’s private key will be 
associated with an arbitrary number of attributes expressed as 
strings. On the other hand, when a party encrypts a message 
in our system, they specify an associated access structure over 
attributes. A user will only be able to decrypt a cipher text 
if that users attributes pass through the cipher texts access 
structure. At a mathematical level, access structures in our 
system are described by a mono-tonic “access tree”, where 
nodes of the access structure are composed of threshold gates 
and the leaf describe attributes. We note that AND gates can 
be constructed as n-of-n threshold gates and OR gates as 1-of- 
n threshold gates. Furthermore, we can handle more complex 
access controls such as numeric ranges by converting them 
to small access trees. In this setting, the encryptor must be 
able to intelligently decide who should or should not have 
access to the data that she encrypts. At a technical level, the 
main objective that we must attain is collusion-resistance: If 
multiple users collude, they should only be able to decrypt 
a cipher text, if at least one of the users decrypts it on their 
own.In an attribute-based encryption system cipher texts are 
not necessarily encrypted to one particular user as in traditional 
public key cryptography. Instead both users’ private keys and 
cipher texts will be associated with a set of attributes or a 
policy over attributes. A user is able to decrypt a cipher text 
if there is a “match” between his private key and the cipher 
text. In Threshold ABE system in which cipher texts were 
labeled with a set of attributes S and a user’s private key was 
associated with both a threshold parameter k and another set 
of attributes S’. In order for a user to decrypt a cipher text at 
least k attributes must overlap between the cipher text and his 
private keys. 

In our context, the role of the parties is taken by the 
attributes. Thus, the access structure A will contain the autho- 
rized sets of attributes. We restrict our attention to monotone 
access structures. However, it is also possible to (inefficiently) 
realize general access structures using our techniques by 


having the not of an attribute as a separate attribute altogether. 
Thus, the number of attributes in the system will be doubled. 
From now on, unless stated otherwise, by an access structure 
we mean a monotone access structure [2] 

System Model The concrete system model of our proposed 
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Fig. 7. 


System model of CP-ABE 


CP-ABE is shown in the above figure, which mainly consists 
of four entities as follows. 


i) Attribute Authority (AA): It is responsible for imple- 
menting the system setup algorithm to generate the 
system setup parameters and implementing the key 
generating algorithm to generate the secret key for the 
data user. 

ii) Data owner (DO): He is responsible for implementing 
the data encryption algorithm on the plaintext data and 
sends the generated cipher text to CSP. If DO decides 
that some attributes need to be revoked, he will first 
designate the responding revoked users list and then 
the list to CSP. 

iii) Data User (DU): He is responsible for implementing 
the decryption algorithm. If the DU wants to access the 
data in the CSP, he will first send his transformation 
key to the CSP for partial decryption. Once the DU 
receives the partially decrypted cipher text, he will use 
his secret key to implement the final decryption. 

iv) Cloud Storage Provider (CSP): He is responsible 
for implementing the data reencryption algorithm to 
achieve the cipher text updating and implementing 
the partial decryption algorithm for the DU. Here, we 
assume that the CSP is curious but honest; namely, 
he will honestly execute the tasks assigned by other 
legitimate entities in the system; however, he has the 
incentive to learn the contents of encrypted data as 
much as possible 
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Fig. 8. CP-ABE Access control 


In CP-ABE, each user is linked with a set of attributes. 
His secret key is generated based on his attributes. While 
encrypting a message, the encryptor specifies the threshold 
access structure for his interested attributes. This message 
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Parameters 


KP-ABE 


CP-ABE 


Fine grained access con- 
trol 


Low, High if there is reencryption technique 


Average realization of complex access control 


Efficient 


Average, high for broadcast type system 


Average, not efficient for modern enterprise environment 


Computational overhead 


Most of computational overhead 


Average computational overhead 


Collision resistant 


Good 


Good 


Cipher texts 


associated with set of attributes 


associated with policies 


Users Key 


associated with policies 


associated with set of attributes 


Encryptor 


no control over who has access to the encrypted data 


Must be able to intelligently decide who should or should not 
have access to the data that she encrypts 


Access Tree 


access tree in user key, list of attributes in cipher text 


access tree in cipher text, list of attributes in users key 


Control User encrypting file has limited control of who | User encrypting has strong control 
decrypts 
Security Average High 


Access policy Can be determined after encryption 


Can be determined after encryption 


TABLE I 
COMPARISON BETWEEN KP-ABE AND CP-ABE 


is then encrypted based on this access structure such that 
only those whose attributes satisfy the access structure can 
decrypt it. With CP ABE technique, encrypted data can be kept 
confidential and secure against collusion attacks. A cipher text- 
policy attribute based encryption scheme consists of four fun- 
damental algorithms: Setup, Encrypt, KeyGen, and Decrypt. In 
addition, we allow for the option of a fifth algorithm Delegate. 


i) Setup: The setup algorithm takes no input other than 
the implicit security parameter. This algorithm takes 
as input a security parameter and returns the public 
key PK as well as a system master secret key MK. 
PK is used by message senders for encryption. MK is 
used to generate user secret keys and is known only 
to the authority. 

Encrypt (PK, M, A): The encryption algorithm takes 
as input the public parameters PK, a message M, and 
an access structure A over the universe of attributes. 
The algorithm will encrypt M and produce a cipher 
text CT such that only a user that possesses a set of 
attributes that satisfies the access structure will be able 
to decrypt the message. We will assume that the cipher 
text implicitly contains A. 

Key Generation (MK, S): The key generation algo- 
rithm takes as input the master key MK and a set of 
attributes S that describe the key. It outputs a secret 
key SK that enables the user to decrypt a message 
encrypted under an access tree structure T if and only 
if matches T. 

Decrypt (PK, CT, SK): The decryption algorithm 
takes as input the public parameters PK, a cipher text 
CT, which contains an access policy A, and a private 
key SK, which is a private key for a set S of attributes. 
If the set S of attributes satisfies the access structure 
A then the algorithm will decrypt the cipher text and 
return a message M. 


ii) 


iii) 


iv) 


v) Delegate (SK, S ): The delegate algorithm takes as 
input a secret key SK for some set of attributes S and 
a set S L S. It output a secret key SK for the set of 
attributes S. 


V. CONCLUSION 


Data Sharing and Collaboration in the Cloud is fast be- 
coming available in the near future as demands for data 
sharing continue to grow rapidly. In this paper, the system 
presents a review on enabling secure and confidential data 
sharing in Cloud using attribute based encryption. The system 
examined definitions related to Cloud computing and privacy. 
The system then looks at privacy and security issues affecting 
the Cloud followed by what is being done to address these 
issues. Next, the discussion is about why data sharing in the 
Cloud is important and the traditional approach to data sharing 
in the Cloud. System explains the different techniques, namely 
ABE and PRE that are currently used to enable secure data 
sharing in the Cloud. It also reviewed different schemes in 
attribute based encryption. ABE are one of the method used for 
secure data sharing in cloud. It is one effective and promising 
technique that is used to provide fine-grained access control 
to data in the Cloud. Initially, access to data in the Cloud was 
provided through Access Control Lists (ACLs). AttributeBased 
Encryption is an access control mechanism where a user or 
a piece of data has attributes associated with it. An access 
control policy is defined and if the attributes satisfy the access 
control policy the user should be able to get access to the piece 
of data. It provides collusion resistance. 

The two different schemes are Key policy ABE and Cipher 
text policy ABE. In KP-ABE the access control policy is stored 
with the users private key and 

the encrypted data additionally stores a number of attributes 
associated with the data. A user can only decrypt the data if 
the attributes of the data satisfy the access control policy in the 
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users key. CPABE is essentially the converse of KP-ABE. The 
access control policy is stored with the data and the attributes 
are stored in the users key. Even though ABE is one of the 
good methods for secure data sharing, it has certain drawback. 
That is, when the complexity of access formula grows then 
the computational cost during decryption phase increases. The 
attribute authority has to issue private keys to all users, but yet 
private key generation typically requires large exponentiation. 
When a large number of users call for their private keys, it 
may overload the attribute authority. 
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Abstract—QR code is the trademark for the type of matrix 
bar code (or two dimensional bar code).A bar code is a machine 
readable optical label that contains information about the item to 
which it is attached.A QR code picture is covered to the binary 
image which can be identified. The experiments show that the 
new binarization method is efficiency and accuracy in improving 
the OR code recognition significantly. 

Index Terms—structure of QR code, Binarization, Binarization 
Methods 


I. INTRODUCTION 


Bar codes are widely used to store and retrieve infor- 
mation on even tiny objects because of its size, reading 
speed,accuracy, 360 degree reading and error correction capa- 
bility. 1D bar codes have the data only in horizontal direction 
with limited data capacity. But 2D bar codes can hold data 
in both horizontal and vertical direction of QR image. When 
QR code compares with 1D bar codes the data capacity is 
more than 100 times of a QR code image. QR code is in- 
vented by Denso Wave corporation, Japan in 1994. The Quick 
response code(QRC) adapted with many Japan’s company like 
automatic identification manufactures inc, automotive industry 
action group. The QR code is not only the encoding and 
decoding process beyond that it is a pure image processing 
oriented. 

A QR code is fundamentally an image. This image have 
place many places. This QR image can store up to 7089 
characters(numeric). QR code image is already used in Japan 
for various purpose as Air ticket, processed food tranceability 
chain. In India the QR code images are found on government 
issued documents such as income tax certificates, Aadhar card 
etc. From this QR codes are widely accepted technology for 
fast access data around us. In present more than 100 websites 
to generate QR code for personal or official use with free of 
cost. 

The QRC image is everywhere around us. This paper is 
deals with the comparison of image bianarization of the QR 
code. Binarization an image symbolizing conversion of the 
image into black and white. Information will be decreased to 
only two values either Oor 1 and this builds the processing 
easier in the following stages to consider the various compo- 
nents of performing manipulation particularly on the QR code 
image. 
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II. STRUCTURE OF QUICK RESPONSE CODE 


A two dimensional bar code that can hold/store information 
in both directions X and Y is commonly known as Quick 
Response code. Since the QR code has the capability of storing 
the information in both directions it has comparatively high 
capacity to hide the information , unlike the usual bar code. 
QR code has enhanced security functionality because of its 
betterment of structure. 

Generally Quick Response codes are used to hide simple 
information like the web URL, or an important bank a/c 
number. It is very famous for its capability of holding Japanese 
and even Arabic character also. 

Let us discuss about the structure of the QR in a brief way 
which enable us to understand better about the way we can 
able to hide the information in a more structured way. 


Finder Pattern 
Timing Pattern 


Quiet zone 


Fig. 1. Structure of QR code 


e Finder pattern: The position of the QR code has been 
detected using this pattern. It also used to identify cor- 
rectly about the angle and dimension of the QR code 
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symbol.This finder pattern can be more useful for detect- 
ing the QR code in the round the clock angles. 

e Alignment pattern: The middle black module of the 
alignment pattern makes this distortion identification very 
simple and efficient. 

e Timing pattern: This are the more supportive pattern, 
which helps to identify the central co-ordinates of the 
cell, if there exists any error path. This can be see in 
both horizontal and vertical directions. 

e Quiet zone: This is more a form of identifying the QR 
from its more complex backgrounds and makes it simple 
for our data embedding technique. 

e Data Area: This is the area where we can actually embed 
our secret information .if the QR code is in its binary 
format, we can easily embed the zeros in the black 
module and ones in the corresponding white module 
which helps in a very convenient way for our information 
hiding. 


III. BINARISATION PROCESS IN QR CODE 


A. Methods of Binarization 


QR code must be converted to binary image before recog- 
nition, which have only two values of black and white. The 
regular practice is that choosing the appropriate threshold from 
the different grey levels between black and white modules in 
the QR code symbols, and turning each pixel into the black or 
white by comparing with the threshold ,then resulting to the 
binary image. The different types of binarization methods for 
QR code image are briefly discussed below: 


1) OTSU Method: OTSUs method is based on the principle 
that the gray-level for which the between-class variance is 
maximum or within class variance is minimum is selected 
as the threshold. OTSU calculates a global threshold by 
accepting the existence of two classes , foreground and 
background, and choosing the threshold that minimizes 
the interclass variance of the thresholded black and white 
pixels. OTSU method improves the image segmentation 
. OTSU method is one of very efficient methods to 
threshold the gray image. It can be implemented by two 
approaches. 


e Iteration Approach: Iteration Approach is exact rep- 
resentation of Otsu’s method. Here within-class vari- 
ance is simply the sum of the two variances mul- 
tiplied by their associated weights. It has lot of 
computations. 

e Custom Approach: Its Easier using simple recurrence 
relations. Here the threshold is calculated by opti- 
mized formula of Otsu’s method. 


2) Adaptive Nested Local Binarization: First adaptive 
Nested local binarization process is the QR code de- 
composed into several sub-graphs, then expand the sub- 
graph, form new sub-graph, make each new sub-graph 
for binarization. First the QR code images are divided 
into k * k blocks, that is k*k sub-graphs. Check whether 
the sub-graphs have more than two peaks. If sub-graphs 


has two peaks or more, the images are divided into (k 
+1) * (k +1), until all sub-graphs have two peaks or less. 
Then apply OTSU method, expand each sub- graphs and 
form new subgraphs. Finally, Splicing the results of the 
binarization image sequentially. 

3) Niblacks Algorithm: In Niblack algorithm the threshold 
value for the local area under the window is calculated 
pixel wise. The calculation of the threshold value is 
depending upon the local mean and standard deviation 
of window area. 

4) Sauvola Method: The Sauvola method is a modified 
form of Niblack algorithm. It gives more performance 
than Niblack under such conditions as light variation on 
document image, light texture etc. 

Where m is the mean of pixels under window area, S is the 
dynamic range of variance and the value of k parameter may 
be in the range of 0-1. 


IV. EXPERIMENT AND RESULT COMPARISON 
A. Experiment 


In order to achieve the practical application environment 
as much as possible ,the experiment has three steps .At first 
,QR code image is display in monitor ,and it is captured 
by camera in various lighting conditions .Then the QR code 
image is identified by various kinds of binarization, include 
OSTU algorithm,iterative method, Niblack algorithm, Sauvola 
method, Adaptive nested local binarization. Finally the results 
were compared. 


B. BERs comparison 


Evaluation criteria: Because QR code image is used to store 
information ,and its identification need to identify pictures of 
black and white pixels.BER is called the bit error rate in the 
communication field. In the article,BER is the ratio of the 
difference between the binary image completed by various 
kinds of binarization methods from the original image.The 
BERs value is used to evaluate the binarizaton effect. 


Methods OSTU Niblack Nested Sauvola 
Algo- Algo- local method 
rithm rithm method 

BER 0.17942 0.1598 0.0996 0.1645 

Methods OSTU Niblack Nested Sauvola 
Algo- Algo- local method 
rithm rithm method 

Identification 42% 60% 98% 62% 

success rate 


C. Identification Success and Rate’s Comparison 


QR code image are collected in different lighting situations 
including a bright light, low light illumination, uneven illumi- 
nation, reflection and so on.The following table is the count 
of identifiation success rate,after 100 times of experiments. 
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V. RESULTS 


From the comparison of binarization methods we get lower 
BER through using the nested local binarization method.The 
lower BER value is,the better image quality. And also we 
can get the highest identification success rate in nested local 
binarization method. 


VI. CONCLUSION 


In this paper, QR code’s binarization is studied. The nested 
local binarization method has a good subjective and objective 
effect comparing with the traditional method.It can improve 


the decoding efficiency and accuracy in QR code image 
identification. 
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Abstract—Continuous delivery is the ability to get changes 
of all types -including new features, configuration changes, bug 
fixes and experiments into production or into the hands of users, 
safely and quickly in a sustainable way. This paper illustrates the 
architecture for continuous delivery and describe the implications 
of architecting for CD, how jenkis evolved from continuous 
implementation to continuous delivery. Introducing continuous 
delivery of mobile apps in corporate environment as a case study. 


Index Terms—Continuous Delivery, Continuous Deployment, 
Continuous Software Engineering , Jenkins, Release Manage- 
ment, Configuration Management, Continuous Integration, User 
Feedback 


I. INTRODUCTION 


Continuous delivery is a software engineering approach in 
which team produce software in short cycles,ensuring that 
the software can be reliably released at any time . It aims 
of building testing and releasing software faster and more 
frequently. The continuous delivery approach to provide an 
organization the ability rapidly,efficiently and reliably that 
bring services to the market and this is the step a head to the 
competition world .In order to promote artefact created in the 
production environment the Software Delivery Process(SDP) 
consists of several tasks .The requirements and characteristics 
of each software product software delivery process,the general 
process to various contexts probability can not be set we 
should interpret a Software Delivery Process as a frame work 
to be customized due to the unique characteristics of each 
software product. 

In this paper it indicate the problem of software con- 
tinuous delivery .The main objective is setup of ser- 
vices,process,technique tools that for assist the software de- 
livery continuously. We attempt to answer the following ques- 
tions. 

i) What are the characteristics of the CD context? 
ii) Why we want to architect for CD? 

iii) What does architecting for CD imply? 

The answer of this question provide practitioners with 
insights for architecting their software applications for CD and 
solving the various challenges associated with CD. 

User guides also known as user manuals,are a type of 
documentation aimed at helping a user operate a specific sys- 
tem. User guides usually include screenshots that show users 
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how to interact with the user interface.Because creating such 
screenshots is a slow, manual process, keeping the user guide 
up-to-date with changes in the user interface is challenging. 
Screenshots are sometimes annotated with arrows,rectangles or 
text that highlight specific elements with which the user should 
interact, or where the user can find important information. 

In business environment software development is increased 
dynamically.Fast changing markets,complex and changing 
customer requirements ,pressure of shorter time-to-market,and 
rapidly advancing information technologies are characteristics 
found in most mobile software development projects.The dig- 
ital transformation happens in the mobile application sectors 
.The usage is increased.continuous software engineering refers 
to the organizational capability to develop release,and learn 
from software in rapid parallel cycles. Aglie software devel- 
opments practices have seen a major boom in the past decades 
due to their ability in coping with changing requirements and 
enabling short release cycle.The letter aims to integrate ecode 
analysis every aspect of software development ,starting from 
source-code and up to the delivery to productive environments, 
in a Continuous Integration Pipeline .The ultimate goal is 
being able to deliver the latest software to the customer at 
the push of a button. 

In this paper we describe rugby‘s initial release management 
workflow. Analyze its applicability in eight industrial projects 
at CapGemini. Additional requirements found in personal 
interviews with project managers lead to an extended work 
flow focused on tailoring that we describe below. We evaluated 
this workflow and its impact on industry projects and present 
our findings in this section. 


II. TOWARDS ARCHITECTING FOR CONTINUOUS 
DELIVERY 


Continuous delivery (CD) is a software engineering ap- 
proach in which teams produce software in short cycles, 
ensuring that the software can be reliably released at any time. 
It aims at building, testing, and releasing software faster and 
more frequently. The approach helps reduce the cost, time, and 
risk of delivering changes by allowing for more incremental 
updates to applications in production. A straightforward and 
repeatable deployment process is important for continuous 
delivery.To practice continuous delivery effectively, software 
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applications have to meet a set of architecturally significant 
requirements (ASRs) such as deployability, modifiability, and 
testability. 

These ASRs require a high priority and cannot be traded 
off lightly anymore.In the context of enterprise architecture 
there are typically multiple attributes we are concerned about, 
for example availability, security, performance, usability and 
so forth. In continuous delivery, we introduce two new ar- 
chitectural attributes: testability and deployability.In a testable 
architecture, we design our software such that most defects 
can (in principle, at least) be discovered by developers by 
running automated tests on their workstations. In a deployable 
architecture, deployments of a particular product or service can 
be performed independently and in a fully automated fashion, 
without the need for significant levels of orchestration. 


A. Organizational context 


The company heavily relies on an increasingly large number 
of custom software applications. These applications include 
websites, mobile applications, trading and pricing systems, 
feeds distribution systems, and software used in the betting 
shops. These applications are developed using a wide range 
of technology stacks, including Java, Ruby, PHP, and .Net. 
To run these applications, the company has an IT infras- 
tructure, which consists of thousands of servers in different 
geographical locations. These applications are developed and 
maintained by the Technology Department, which employees 
approximately400people.The size of each software develop- 
ment team varies depending on the size and complexity of 
the application,from two to 26 members.The majority of the 
teams have four to eight people. 


B. Characteristics of continuous delivery 


Define continuous delivery as a software engineering dis- 
cipline in which teams keep producing valuable software 
incrementally in short cycles and ensure that the software can 
be reliably released at any time.The characteristics are given 
below. 

1) Releasable at Any Time/Frequent Releases: Teams that 
practice CD ensure the software is releasable at any time. They 
usually make frequent releases, as frequent as multiple times 
a data. 

2) Reliable/Automated Release: If a release involves de- 
ploying an application to a production environment ,the 
deployment should be reliable.To achieve reliable release, 
the deployment is usually automated. Before executing the 
production deployment, the deployment process and scripts 
have usually been exercised several times in different testing 
environments. 

3) Delivering Valuable Software: The team continuously 
makes sure the software being developed provides value to 
customers. This implies the ability to quickly gather users 
feedback on a feature once it is delivered. 

4) Small Size: The size of a user story is usually sufficiently 
small so that it can be finished within a week.The number 
of user stories in a single release is also small.Keeping such 


increments small helps to reduce the cycle time and release 
risks. 


C. Why do we architecture for continuous delivery? 


The main reason is the huge benefits we have observed after 
moving 22 software applications to continuous delivery. 

1) Accelerated Time to Market: The release frequency has 
dramatically increased from once every one to six months 
to once a week on average. Some applications were released 
multiple times a day when needed. 

2) Build the Right Product: The feedback enables the teams 
to work only one useful features. When a feature is found to 
be not useful,no further effort will be spent on it.This helps 
the team to build the right product. 

3) Improved Productivity and Efficiency: Significant im- 
provement in productivity and efficiency was also observed. 

4) Improved Product Quality: Significant product quality 
improvement was observed. The number of open bugs for the 
applications has been reduced by more than 90 percent In 
addition,the number of priority | incidents in production has 
been reduced significantly. 

5) Improved Customer Satisfaction: The users of the ap- 
plications are internal customers in a different department. 
Managers have commented that the relationship between these 
two departments has improved. Trust has been established. 


D. What Does Architecting for Continuous Delivery Imply? 


The most salient implications of CD to architectures of 
software applications are about Architecturally Significant 
Requirements(ASRs). ASRs are those requirements that have 
a measurable impact on a software system’s architecture. 

1) Deployability: After we moved to CD, typically a soft- 
ware application is deployed to several testing environments 
multiple times a day and deployed to the production environ- 
ment once or twice a week. 

2) Security: With more frequent releases, the application 
goes down and up more frequently. The security vulnerabilities 
of the application during the start-up time become more 
important, as attackers (hackers) get more opportunities to 
attack the software during its start-uptime. 

3) Loggabilty: Architecture principles should be put in 
place to guide the logging of the applications. The architecture 
principles can include rule son what to log,the log format,the 
logging mechanism, etc. Each of these aspects is important 

4) Modifiability: In continuous delivery, the concern of 
modifiability also extends to how a modification can be 
deployed. When architecting an application for modifiability 
„the ease with which modifications can be deployed should 
also be considered and evaluated. 

5) Monitorability: Monitoring is important for practicing 
CD. We rely on monitoring to get immediate feedback on a 
deployment, especially when things are broken, so that we can 
take remedial actions before those broken things have a major 
impact on our customers. 
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6) Testability: To ensure the readiness for release, the CD 
pipeline heavily relies on tests that are hooked into the stages. 
Through these tests, the team ensures that when a code change 
passes all the stages, it is ready for release. Good testability 
should be architected into the software application, so that 
developing these tests is feasible and cost effective. 


II. A PRACTICAL APPROACH TO SOFTWARE 
CONTINUOUS DELIVERY 


Software Delivery Process (SDP) consists of several tasks 
in order to promote artifacts created in the production envi- 
ronment (server where an executable is installed to delivery 
features to the users). These ones can occur in either en- 
vironment, producer or consumer.The unique characteristics 
of each software product, a general process to various con- 
texts probably cannot be set we should interpret SDP as a 
framework to be customized according to the requirements and 
characteristics of each product Software Delivery Process, is 
a part of Software Development Process. In this context, this 
paper presents a practical approach to address the problems 
of software continuous delivery. The main objective is to 
contribute with a setup of servers, process, techniques and 
tools that assist to deliver software continuously. 


A. Fundamental concepts and related works 


There is a relation between quality of software products and 
quality of the process used to build them. Implementation of 
a process aims to reduce rework, delivery time and increase 
product quality, productivity, traceability, predictability and 
accuracy of estimates. 


e Deploying software manually: there should be only two 
tasks to perform manually; (1) choose a version and (2) 
choose the environment. 

e Deploying after development (requirement, design, code 
and tests) was complete: it is necessary to integrate all 
activities of the development process and put stakeholders 
working together since the beginning of the project. 

e Manual configuration management of production environ- 
ments: All aspects of configured environments should be 
applied from a version control in an automated process. 


B. A Practical Approach 


A .Main Proposal 

To provide an infrastructure that allows the Software Con- 
tinuous Delivery is the main goal of setup shown in Figure 3. 
It has 4 areas: Commit Stage (CS), Quality Assurance (QA), 
Staging (ST) and Production (PD). 

B. Areas 

Commit Stage (CS) has primary responsibility to implement 
continuous integration of all code reviews sent to the reposi- 
tory. This area consists of the following services: 

e Public Code Repository 

e Continuous Integration 

e Static Analysis 

e Peer-Review 

e Canonical Repository 


e Repository Libraries 


Quality Assurance Area (QA) has the main purpose of 
performing all automated tests and allow Quality Manager 
perform manual tests, such as exploratory testing. This area 
consists of the following services: 


e Continuous Integration 
e Page Servers, Application and Database 
e Load Test 


IV. CONTINUOUS DELIVERY WITH JENKINS 


Continuous delivery(CD) is software engineering approach 
in which teams software in short cycles,ensuring that the 
software can be reliably at any time. It aims at building, 
testing and releasing software faster and more frequently. 
The approach helps reduce the cost, time and risk of de- 
livery changes by allowing for more incremental updates to 
applications in production. A straightforward and repeatable 
deployment process is important for continuous delivery. The 
continuous delivery approach to provide an organization the 
ability rapidly, efficiently and reliably that bring services to the 
market and this is the step a head to the competition world. 


A. The Jenkins Platform 


Jenkins is a open source automation server written in 
Java.Jenkins helps to automate the non-human part of the 
software development process,Jenkins security depends on 
two factors : access control can be customized via two 
ways,user authentication and authorization protection from 
external threats such as CSRF attacks and malicious builds 
is supported as well. 

The traditional way of managing software development, 
known as Waterfall model, it’s a logical and sequential model 
which has the big flaw that does not easily apply to the reality: 
one of the key concepts of this model, indeed, is the definition 
of frozen requirements in a preliminary and isolated state. This 
preliminary step would initiate the waterfalls of subsequent 
steps, which could take weeks or months to terminate, in- 
volving design, implementation, verification, and deployment. 
During the time needed to go through the whole waterfall of 
steps, the requirements likely change, due to the volatility of 
the market, to new competitors, or simply to an improvement 
of the original project. 

Jenkins was again enhanced and extended, and moved from 
being a CI tool solely to a CD platform, allowing Dev, QA and 
Ops teams to work closely together using the same orchestrator 
(Fig.1) and embracing the key points of CD : 


e Collaboration across all the teams involved in the 
product-lifecycle 


e Extensive automation of the delivery process 


In jenkins this is accomplished thanks to plugins that make it 
possible to chain jobs together,promote the execution of jobs 
and allow human intervention. 
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Jenkins: the Hub of Continuous Delivery 
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Jenkins as orchestrator tool 


B. Puppet/chef 


Puppet and Chef are the most commonly used IT Au- 
tomation Tools used by DevOps to set up the infrastructure, 
speeding up the process of installing the required software, 
middleware and various dependencies. From a technical point 
of view, integrating Jenkins with Puppet/Chef is no more dif- 
ficult than establishing a SSH connection: the Puppet Master 
and the Chef Server will ensure that the managed servers 
are configured properly and eventually deploy the artifact 
generated by Jenkins. 

However, although Jenkins can keep track of all the files 
generated by a build - assigning them a MD5 checksum that 
will be stored in the Jenkins fingerprint database [9] - as the 
artifact leaves the Jenkins environment, to be deployed by 
Puppet/Chef, the traceability will be lost: 


e Where does the artifact come from? 
e Which was the build that generated the artifact? 


C. Jenkins and Workflow 


With CD, Jenkins has become the most-used orchestrator 
for the different phases of the product life cycle: from the 
checkout of the code and the unit-tests, to the static code 
analysis, to the performance tests, to the release of the binaries 
until the deployment into test/staging/production. To make this 
possible, Jenkins needed to be enhanced in order to provide 
capabilities to support different orchestration strategies and 
visualization tools. This resulted in a wide set of plugins 
which allow the implementation of complex and not sequen- 
tial pipelines, involving different jobs (one for each phase) 
and promotion strategies. Although these several plugins (i.e. 
Build Flow, Continuous Delivery Pipeline) work together and, 
generally, do not conflict with each other, looking for the 
right way to implement the pipeline, choosing the right plugin 
amongst many, and using a different plugin for each specific 
functionality, tend to be painful and time consuming and result 
in long chains of jobs, difficult to manage. 


V. GUIDEAUTOMATOR: CONTINUOUS DELIVERY OF END 
USER DOCUMENTATION 


A user guide or user’s guide also commonly known as 
a manual is a technical communication document intended 
to give assistance to people using a particular system. I 
usually written by a technical writer although user guides 
are written by programmers product or project managers or 
other technical staff particularly in smaller companies user 
guides are most commonly associated with electronic goods 
computer hardware and software. User guides usually include 
screenshots that show users how to interact with the user 
interface. Because creating such screenshots is a slow, manual 
process, keeping the user guide up-to-date with changes in the 
user interface is challenging. 

The GuideAutomator DSL currently contains instructions 
for opening a URL, filling in forms, clicking on an element, 
surrounding elements with red rectangles for highlighting, tak- 
ing screenshots, and cropping screenshots around an element. 
Details about all instructions can be found on GuideAutoma- 
tors website7. 


A. GuideAutomator 


GuideAutomator is also an automated documentation gen- 
erator.It is based on tools and principles from functional 
testing and plain text documentation. Functional testing is 
the process of checking whether the behavior of a software 
system conforms to its specification, from the end user point 
of view, by executing test cases derived from its specification. 
It is a command-line tool that takes as input a text-only 
document with a specific format and creates a PDF and 
an HTML file representing the user guide, containing both 
text and screenshots. The input format mixes Markdown text 
with code chunks in GuideAutomator DSL, a domain-specific 
language based on the Selenium WebDriver API to control a 
web browser and take screenshots. 

The tool that implements our approach, GuideAutomator, 
is still an early prototype. Here we enumerate challenges that 
need to be overcome to make the approach feasible for end 
user documentation on an industrial scale. 


B. Usability issues 


Currently GuideAutomator is a command line tool with 
no graphical user interface. The technical writer has to use 
general-purpose developer tools embedded in the web browser 
to get the information needed to write the screen capturing 
code embedded in the user guide, which requires knowledge 
of web design and web programming. 


C. Feature limitations 


Currently the output options are very limited; the tool 
generates HTML and PDF files with a predefined layout that 
cannot be customized. There is no way to create a cover page 
or add headers and footers. In a future work, we intend to add 
customization options to the tool. 
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VI. INTRODUCING CONTINUOUS DELIVERY OF MOBILE 
APPS IN A CORPORATE ENVIRONMENT: A CASE STUDY 


According to Rugby’s release management workflow iden- 
tified needs for project based organization developing mobile 
application. Varying characteristics and restrictions in projects 
teams in corporate environments impact both process and 
infrastructure. We found that applicability and acceptance of 
continuous delivery in industry depend on its adaptability.To 
address issues in industrial projects with respect to delivery 
process, infrastructure, neglected testing and continuity, we 
extended Rugbys workflow and made it tailorable. 

To deal with digital transformations, agile methods like 
Scrum advocate flexibility, efficiency and speed. Many soft- 
ware companies succeeded in incorporating agile practices into 
their development workflow. Continuous software engineering 
refers to the organizational capability to develop, release, and 
learn from software in rapid parallel cycles. 


A. Rugbys Release Management Workflow 


A lightweight process model that includes concepts of 
Scrum and the Unified Process. Figure shows Rugbys re- 
lease management workflow. It incorporates version control, 
continuous integration (CI), continuous delivery (CD), and 
a feedback mechanism. The workflow starts each time a 
developer pushes source code to the version control server 
. This leads to a new build on the CI server that notifies the 
developer about the build status. If the build was successful 
and after it passed all test stages, the release manager can 
release the build on the CI Server which uploads the build 
to the CD Server. Users are automatically notified about the 
availability of the new release so that they can download it onto 
their device . Within the application, they can give feedback 
which is uploaded to the CD server and forwarded to the issue 
tracker notifying the developer. This workflow allows practices 
like “release early, release often”, well established in open 
source software development. 


B. Applicability in industrial projects 


We analyzed the development process for mobile applica- 
tions in a global company with heterogeneous project en- 
vironments with respect to team size and project duration. 
We conducted an the answers of the interviews revealed the 
following key issues interview.The answers of the interviews 
revealed the following key issues. 
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Fig. 2. Rugbys release management workflow 


The answers of the interviews revealed the following key 
issues.The infrastructure is not sufficient for the delivery of 
mobile applications.Automatic testing is neglected in favor of 
development speed, mainly because the setup effort is too 
high in mobile projects.Continuity is missing in the devel- 
opment.Mobile applications are targeting different platforms, 
some of them are developed natively, but in different pro- 
gramming languages, others using cross platform frameworks. 
The continuous delivery workflow should therefore support 
multiple platforms. 


C. Extending and tailoring Rugbys workflow 


The extended workflow provides tailoring by the possibility 
to distinguish between mandatory and optional activities. In 
particular, projects can modify the workflow depending on 
project size, complexity, staffing, time line and priorities. For a 
better understanding of how a tailored version of our workflow 
could be instantiated. 


D. Evaluation 


The goal was to measure the effect of our extended and 
tailorable workflow on adoption and impact of continuous 
integration and delivery in mobile projects. We evaluated 
how the workflow is applied by project teams and how it 
influences their development process. Regarding continuous 
integration,we were concerned with the degree to which the 
project has adopted a continuous workflow as well as what 
and how they test, which metrics they collect and how they 
use them. 

Regarding continuous delivery, we were interested in the 
value of using the delivery service in combination with an 
integration system. We also looked at how this transforms the 
quantity, quality and nature of feedback the project receives 
from users. 

1) Study Design: To evaluate our tailored workflow, we 
introduced it in eight mobile projects of CapGemini with 
customers from different industrial sectors.We asked project 
managers to participate in a survey including both qualitative 
and quantitative questions.Survey participants were project 
managers with multiple years of experience in the mobile 
domain and insights into how mobile projects work. All 
surveyed projects were familiar with agile methodologies and 
the integration system was already known within the company. 

2) Findings: We grouped our findings into five categories 
according to our research questions: integration, testing, met- 
rics, delivery, and feedback. 


i) Integration: While the primary goal of our solution is 
to move projects to continuous delivery, a prerequisite 
is to move them to continuous integration. Usage 
of an integration system increases with a turn-key 
solution available, although some projects have their 
build server located at the client due to special re- 
quirements.Dependencies are now managed in a more 
structured manner with most projects using both a 
dependency management system and version control 
system. 
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ii) Testing: Keeping in mind that the effort of defining 
and implementing tests remains unchanged, we looked 
at whether projects test more or differently when the 
right infrastructure is made available. 

iii) Metrics: Our integration system allows projects to 
collect various metrics. This allows developers and 
project managers to gain deeper insights into the code 
base . 

iv) Delivery: Having implemented a custom delivery ser- 
vice that supports various degrees of automation, we 
were interested in its impact on complexity, effort, and 
duration of delivery. 

v) Feedback: Finally, we were interested in whether 
applying continuous delivery not only saves time and 
effort but also increases the quantity and/or quality of 
feedback that can be collected from users. 


3) Threats to Validity: We designed the survey to gain 
insight into the development process of mobile projects and 
to measure the impact of our release management work flow. 
our results may be subject to the following limitations. 


i) Reliability: Duration of the evaluation period, number 
of metrics, or level of detail may have influence on 
the reliability of our results. 

ii) Generalizability: Number of projects and variation of 
project characteristics may be too low in order to 
achieve generalizeable results. 

iii) Selection bias: Participating in our survey may have 
already worked in an agile fashion and had a special 
interest in automated integration and delivery. 

iv) Research bias: Bias caused by an appreciation for agile 
in general or our solution in particular as well as 
the positive results of our previous study may have 
influenced the wording of our questions. 


VII. KEEPING CONTINUOUS DELIVERY SAFE 


Agile software development practices have seen a major 
boom in the past decades due to their ability in coping with 
changing requirements and enabling short release cycles. This 
has climaxed in todays movements of DevOps and continuous 
software engineering. The latter aims to integrate every aspect 
of software development, starting from source-code and code 
analysis up to the delivery to productive environments, in a 
Continuous Integration Pipeline. The ultimate goal is being 
able to deliver the latest software to the customer at the push of 
a button. safety is arguably the most significant difference be- 
tween cyber-physical systems and application software. Since 
agile and continuous software development is transcending to 
cyber-physical-systems, it is crucial to investigate how safety 
requirements can be integrated into these concepts. 


A. Continuous Safety Builds 


There are several differences between the delivery pipeline 
of safety-critical cyber-physical systems such as cars, the most 
obvious being hardware integration and safety testing and this 
one sticks out. For all other steps, there are well-established 
or experimental implementations that allow integration into 


a Continuous Delivery Suite but not for safety analysis and 
testing.All steps in the pipeline use the result of the previous 
step and combine them with one or more artifacts, mainly 
automated test scripts. Other examples for artifacts may be 
tules for static code analysis, hardware or third-party libraries. 
All of them have in common that they are developed in parallel 
to the source code, kept up-to-date on central infrastructure 
and that an update in any of these should cause a new 
build. Based upon this notion, we formulate the following 
guidelines for Continuous Safety Builds: Based upon this 
notion, we formulate the following guidelines for Continuous 
Safety Builds. 

1) Safety Analysis Performed in Parallel to Development: 
An up-to-date safety analysis is the first prerequisite for 
continuous safety testing. The analysis must be maintained 
iteratively in line with the requirements and source which 
integrates safety analysis into Scrum sprints. Some analysis 
methods are designed to be iterative with respect to a changing 
system design or architecture. 

2) Automated Safety Test Execution and Generation: Auto- 
mated execution of safety test cases is an obvious requirement, 
but the test cases need to be automatically generated from 
the analysis as well. Since the safety analysis needs to be 
integrated into development, the necessary manual work needs 
to be reduced to a minimum. the safety analysis has to be 
transformed into concrete test cases that can be executed on 
the software. This step is often done manually today, but there 
are approaches to automate this process fully. 

3) Safety Analysis as an Artifact like Source-Code or Test 
Scripts: The Safety Analysis and its results should be treated 
in no way differently from other artifacts required for a 
build.Like source code, tests and configuration, evolving soft- 
ware analysis results need to be stored in a central repository 
in a versioned manner. 

4) Safety Tests in Every Build: This ensures that every build 
which might eventually be delivered to production has passed 
an automatic safety check. If this check was not passed, the 
build breaks and thus weensure that potentially unsafe software 
is not deployed even in Continuous Delivery scenario. 


B. Implication 


Our propositions will impact the way safety-critical software 
is developed. 

1) Seamless Integration of Safety and Software Engineer- 
ing: In Safe Scrum, every sprint in a scrum process is 
expanded by a safety backlog in addition to the standard 
sprint backlog, which includes the safety-relevant issues that 
are changed in a single iteration. 

2) Every Developer Becomes a Safety Engineer: Since we 
propose that tests generated from the safety analysis are able 
to break the build,this requires an additional check before 
committing to the mainline and thus triggering a build. At 
the very least, developers have to be aware whether or not 
their commits affect any of the safety-relevant components 
and changes system behaviour. If it does, they need to ensure 
that the safety analysis is updated together with the changed 
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source code and/or requirements to keep the build running. If 
the changes are small, they might even be able to make the 
adjustments themselves. 

3) Dedicated Safety Engineers are Supported by Devel- 
opers, Not Replaced: Including developers in the process 
of safety engineering does not mean that dedicated safety 
engineers become obsolete. The necessity to view safety from 
a system perspective still holds, because developers will likely 
focus the aspects of the module they are working on. 


C. Expected Benefits and Impact 


We have presented a new aspect of safety engineer- 
ing.Further research in this area will greatly help us as 
researchers to understand how producers of safety-critical 
systems can set up and maintain fast delivery cycles. At the 
same time,researchers can help such industries to find solutions 
to challenges they encounter in doing so. 


VIII. CONCLUSION 


From the observation we reached at conclusion that ¿the 
effectively practice CD, these application should meet set of 
ASRs. The list of ASRs reported here it is not intended to 
be exhaustive. Continuous delivery with Jenkins facilitating 
the creation of complex work flow, allowing the traceability, 
reducing the time to market and improving the productivity. 
Keeping software systems and their documentation in sync is 
a challenge, especially in continuous delivery environments, 
in which every change to the system is potentially shippable. 


Updating user guides to reflect the latest changes is even 
harder, since it requires recreating scenarios in the system 
that were used to take the original screenshots. Identified 
safety analysis and testing for the wide-scale establishment 
of continuous delivery in industries producing safety-critical 
systems. Formulated guidelines for dealing with these issues 
in the form of continuous safety builds and proposed to treat 
results of a safety analysis no different than any other artifact. 
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Abstract—In this paper an overview of electronic payment 
methods and systems is given.E-commerce provides the capability 
of buying and selling products, information and services on 
the internet and other online environments.In an e-commerce 
environment, payment take the form of money exchange in an 
electronic form, and are therefore called electronic payment.E- 
payment system is secure there should be no threat to the 
user credit card number, smart card or other personal details, 
payment can be carried out without involvement of third party. 
It makes E-payment at any time through the internet directly 
to the transfer settlement and form E-business environment. 
Electronic payment systems can be grouped into three broad 
classes: traditional money transactions, digital currency and 
creditdebit payments. Such payment systems have a number 
of requirements: e.g. security, acceptability, convenience, cost, 
anonymity, control, traceability and control of encryption meth- 
ods. 


I. INTRODUCTION 


E-commerce provides the capability of buying and selling 
products, information and services on the Internet and other 
online environments. In an e-commerce environment, pay- 
ments take the form of money exchange in an electronic form, 
and are therefore called Electronic Payment. The merchant sell 
the goods to customer and customer pay the price with the help 
of E-Payment system.In offline world the payment are made 
with cash or through cheque.In online sales accepting payment 
is a curial aspect of the transaction. Electronic banking is an 
industry which allows people to interact with their banking 
accounts via the internet from virtually anywhere in the world. 
The electronic banking system addresses several emerging 
trends: customers demand for anytime, anywhere services, 
product time-to-market imperatives and increasingly complex 
back-office integration challenges. For many people, electronic 
banking means 24-hour access to cash through an automated 
teller machine (ATM) or Direct Deposit of pay checks into 
checking or savings accounts. But electronic banking involves 
many different types of transactions, rights, responsibilities 
and fees. E-Payment system is secure .There should be no 
threat to the user credit card number, smart card or other 
personal detail, payment can be carried out without involve- 
ment of third party, It makes E payment at any time through 
the internet directly to the transfer settlement and form E- 
business environment. The research issues include: security, 
power consumptionand communication, hybrid networks, data 
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consistency,and environment awareness. The electronic pay- 
ment system is considered as the backbone of e-commerce and 
one of its most crucial aspects. It can be defined as a payment 
service that utilizes the information and communication tech- 
nologies including integrated circuit (IC) card, cryptography, 
and telecommunication networks’ (Raja et. al., 2008). An 
efficient electronic payment system lessens the cost of trading 
and is thought to be essential for the functioning of capital 
and inter-bank markets. With the advancement of technology, 
electronic payment system has taken many forms including 
credit cards, debit cards, electronic cash and check systems, 
smart cards, digital wallets contactless payment methods and 
mobile payments and so on. The future of a specific electronic 
payment system depends upon how it overcomes the practical 
and analytical challenges faced by various means of online 
payments. These challenges include issues of law and regu- 
lation (buyer and seller protection), technological capabilities 
of e-payment service providers, commercial relationships, and 
security considerations such as verification and authentication 
issues (Paunov and Vickery, 2006). 


II. ONLINE PAYMENT SYSTEMS 


There are a wide variety of online payment systems that 
have been developed in past few years and these systems 
can be broadly classified into account-based and electronic 
currency systems. Account-based systems allow users to make 
payments via their personal bank accounts; whereas the other 
system allows the payment only if the consumer possesses an 
adequate amount of electronic currency. These systems offer 
a number of payment methods that include: 

e Electronic payment cards (debit, credit, and charge cards) 

e E-wallets 

e Virtual credit cards 

e Mobile payments 

e Loyalty and Smart cards 

e Electronic cash (E-cash) 

e Stored-value card payments 


A. Credit Cards 


The most commonly used online payment mode so far 
was the use of credit cards. Initially, the security concerns 
hindered in the adoption of credit cards for making online 
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payments but later with the provision of more secure features 
to protect every transaction made, customers developed trust 
on the use of credit cards. Applicability of credit cards is a 
strong factor that contributed to its wide use throughout the 
world. One of the major advantages of credit cards is their 
easy to use functionality with making online transactions in 
no time and from anywhere. These cards are easy to obtain 
and use as customers dont need to purchase any extra software 
or hardware to work with them. Cardholder authentication 
procedure is also simple, with the provision of a name, credit 
card number, and expiry date. For the security of consumers’ 
personal information, credit card companies have developed 
a number of complementary systems including MasterCard 
SecureCode and Verified by Visa. These systems allow users 
to create a password and use it when they shop online through 
their credit cards. 


B. Debit Cards 


This makes it difficult for consumers to handle payment 
disputes as there funds don’t have an extra protection in a debit 
account. For debit payments, providing the account number is 
enough without the necessity of producing a physical card 
or card number. The use of debit cards is particularly high 
in most countries with a specific user base depending on the 
conditions and regulations attached to the issuance of credit 
cards. 


C. Mobile Payments 


Payments made through wireless devices like mobile phones 
and smartphones are thought to provide more convenience, 
reduce the fee for the transaction, and increase the security 
of electronic payment. This payment system has also made 
it easier for businesses to collect useful information about 
their customers and their purchasesMobile payment methods 
are suitable for offline micropayments as well as for online 
purchases. This method is a potential attraction for online 
traders due to an enormous user base of mobile phones. The 
use of mobile payment service does not only reduce the overall 
cost of a transaction but also offer a better payment security. 


D. Mobile Wallets 


Mobile wallet is formed when your Smartphone functions 
as a leather wallet: it can have digital coupons, digital money 
(transactions), digital cards, and digital receipts. Mobile wallet 
service allows the user to install an application from online 
stores in their smartphones and use them to pay for their 
online and offline purchases. Using latest technologies that 
connect smartphones to the physical world such as NFC (Near 
Field Communication), sound waves, and QR codes, cloud- 
based solutions, mobile wallets are believed to provide more 
convenient payment solutions to the customers in future. 


E. Electronic Cash 


Electronic cash systems proposed in the form of Digi- 
Cash or CyberCash. However, these systems were not much 
appreciated and disappeared soon. At present, smart card- 
based systems are more common in use for the payment 


of small amounts by many businesses. Smart cards usually 
rely on specific hardware and card reader for their use and 
authentication. 

The following steps are carried out for payments during 
online procedures: 


1) The payment procedure is initiated by the applicant. The 
applicant selects a bank. 

2) A payment request is sent to the bank that contains an 
XML message with a redirection URL that points to the 
government application. In response, the bank opens a 
session and forwards the user to the given URL. 

3) The authoritys application forwards the applicant on to 
the online banking application of his Bank. After he has 
been authenticated, the payment transaction is carried out. 

4) Before the transaction is carried out, the bank checks if 
there is still a connection open between the bank and the 
authority. 

5) After the connection is confirmed by the authority, the 
bank carries out the money transfer. 

6) A confirmation message is sent to the authority stating 
whether the payment was successful or not. 

7) The authority responds with an acknowledgement mes- 
sage. 

8) The payment process is finalized and the applicant is 
referred back to the authorities application. 


IHI. E-PAYMENT MODELS AND SECURITY 


Electronic payment systems can be grouped into three broad 
classes: traditional money transactions, creditdebit payments 
and digital currency. 


Traditional money transactions: A problem of these 
type of transactions is that the credit card details must be 
handled confidential. For secure credit card transactions, a 
customers credit card number is encrypted using public key 
cryptography, so that it can only be read by the merchant. 
The big advantage of this approach is that the customer does 
not need to be registered with a network payment service; 
all that is needed is a credit card number. However, without 
registration of customers, the encrypted credit card transaction 
does not constitute a signature, anyone with knowledge of the 
customers credit card number can create an order for payment. 


Credit-debit payments: Customers are registered with 
accounts on payment servers and authorize charges against 
those accounts. These systems depend on conversion to/from 
real currency. With the debit or check approach, the customer 
maintains a positive balance that is debited when a debit 
transaction or check is processed. With the credit approach, 
charges are posted to the customers account and the customer 
is billed for or subsequently pays the balance of the account 
to the payment service. Electronic credit/debit money can 
be represented by a digital cheque, just like a normal bank 
cheque. This payment method is by definition not anonymous. 
In this scheme the bank hands out customer specific blank 
cheques to its customers. 
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Digital currency: The digital money, an encoded string of 
digits, can be carried on a smart-card, or stored on a computer 
disk. Like a travelers check, a digital coin is a floating claim 
on a bank or other financial institution that is not linked to 
any particular account. One cardholder can make a payment 
to another without bank involvement, by placing both cards in 
a digital wallet that moves coins from one card to the other. 

Security Requirements for Electronic Payment System: 
Two common protocols are identified that ensure secure 
e-commerce transactions. These protocols include Security 
Socket Layer Protocol (SSL) and Secure Electronic Trans- 
action (SET) [5]. SSL is more commonly used e-commerce 
transactions protocol and it works by encoding the entire 
session amongst computers so that it enables to provide a 
safer communication over the internet. SSL encrypts the online 
communication between Web servers and a client by using 
public-key technology. On the other hand, SET protocol works 
by preventing consumer’s entire credit card number from 
traveling across the internet instead allows pieces of it to flow 
through web communication. some other requirements that 
must be exhibited by the electronic payment systems, these 
include; 


e Confidentiality of information shared by consumers 

e Data integrity 

e Authentication of all the participants 

e Non-repudiation 

e End-user requirements that include usability, flexibility, 
affordability, reliability, speed of transactions, and avail- 
ability. 


IV. REQUIREMENTS OF E-COMMERCE 


In electronic commerce at least two sets of parties will need 
to participate: customers and merchants on the one hand, and 
financial institutions and regulators on the other hand. 


A. Concerns of customers and merchants 


i) Security. Electronic currency is just data and is easily 
copied. It has to be assured that no-one else can divert 
a payment or impersonate another person in order to 
steal his funds. 

ii) Acceptability. A wide range of parties needs to accept 
the payment. 

iii) Convenience. This includes: speed, reliability, fungible 
(the currency or payment unit should be divisible), 
transferability (peer-to-peer payments) and minimal 
specific hardware. 

iv) Cost. Preferable no additional cost, hence no effective 
lower limit to the value of a transaction. Transaction 
costs include any direct costs, at the customer, mer- 
chant and at any intermediary, as well as processing 
or handling time for all parties. 

v) Privacy. No external party (individual, company or 
other authority) can create a historic record of any 
individuals cash transactions. With electronic money 
the bank, or any other party should not be able to 


determine whether two payments were made by the 
same user. 

vi) Durability. The electronic money should not be easily 
lost: for example, when a system crashes. 


B. Requirements for financial institutions 


The financial institutions, that will provide services to 
enable these transactions in the marketspace, and regulators 
will also have a set of requirements for a payment mechanism: 


i) Immediate control. Financial institutions and regu- 
lators will seek a system in which transactions are 
controlled or cleared individually, so that any breach 
of security can be identified as soon as possible. 

ii) Traceability. Financial institutions and regulators will 
seek a system in which transactions are traceable, 
so that if a crime is detected the culprit can be 
identified. In particular, traceability will be important 
to track international funds flows, tax evasion and 
money laundering. 

iii) Control over the spread of encryption mechanisms. 
A key concern of the government, and therefore any 
regulatory body, is to control the spread of encryption 
mechanisms. 


V. CONSUMER ADOPTION OF ELECTRONIC AND MOBILE 
PAYMENT 


The adoption of mobile payment methods is dependent 
upon several factors that affect the consumers’ choice and 
willingness to make use of latest technology for making 
payments. Reviewing of literature regarding this topic we have 
identified certain factors that impact the consumer adoption of 
mobile payment methods either positively or negatively. 

Advantages of mobile payment systems: Prior studies sug- 
gest that mobile payment methods provide their customers 
with a number of advantages including location-free access 
a wide variety of purchase possibilities, an easy alternative 
to cash payments, and timely contact with their financial 
resources. These advantages have attracted consumers to make 
their payments via mobile devices. 


i) Convenience: Convenience (or compatibility) is ex- 
plained as the consistency between an advancement 
and experiences, values, and need of consumers. 

ii) Complexity: complexity in the use of various elec- 
tronic payment methods including smart cards and 
mobile payments, have contributed to the low adoption 
of these services. It is logically to expect mobile 
payments in future will become les and les complex. 

iii) Costs: One of the major factors affecting consumer 
adoption of mobile payment systems is the effective 
cost of a transaction. 

iv) Security of mobile payment systems and trust in ser- 
vice providers: the lack of security and consumer trust 
in service providers as a major barrier to adoption of e- 
commerce transactions. Consumers need confidential- 
ity, authentication, data integrity, and non-repudiation 
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as key requirements for making secure payments over 
the internet. 


VI. CONCLUSION 


We distinguish three categories i.e. traditional money trans- 
actions, credit-debit payments and digital currency. Such pay- 
ment systems have different strengths and weaknesses with 
respect to their requirements: security, acceptability, ease of 
use, transaction cost, additional cost (e.g. point of sale hard- 
ware), privacy/ traceability, durability and immediate control. 
A contradiction exists between the right of privacy/anonymity 
and the possibility and desire of regulators and intermediaries 
to be able to trace any transaction in the economy, whereas 
the customer may want anonymous transactions. An other 
important trade-off to be made is between the need to verify 
the transactions on-line versus the ability to trust a transaction 
without the presence of a third party. 

A dedicated hardware solution might look to be the ideal 
technical solution in many senses, but raises some serious 
market and technical issues. On the onehand, the smart card, 
a secured piece of hardware, can help to solve the double 
spending problem in an offline environment. 

Payment methods have been through a series of evolutions 
from cash to checks, to debit cards and credit cards, and 
now to ecommerce and mobile banking. This study finds that 
customers are increasingly using mobile payment methods for 
their routine online purchases and for their on-site purchases 
as well. With growing advanced technology that supports 
mobile transactions and makes them transparent and more 
convenient, consumers have developed their trust and habits 
on using mobile payment systems. The changing behaviour of 
consumers making a shift from traditional payment methods 
to more advanced online payment systems is quite evident in 
banking and retailing, and with most of the mobile devices 
available. Since it is evident that the mobile devices became 
unavoidable part of almost everyones life form one side and 
the opportunities this technology enables for online and offline 
payment regarding convenience and security, it is unavoidable 
that the use of mobile payment systems will further rise with 
ambition to surpass or even replace cash and other cashless 
payment option. 


This seminar was aimed to embrace a brief spectrum of 
possible issues with electronic payment methods and consumer 
adoption of e-commerce to make payments for their purchases. 
Future research may focus on the validation of factors that 
can contribute to the successful adoption of mobile payment 
methods across the globe. 
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Abstract—The IP network was built decades ago, and with 
todays use of Internet, a new network layer protocol is much 
needed. Named Data Networking (NDN) is a proposal for content- 
centric discovery and routing. 

Named Data Networking (NDN) represents an emerging 
Information-Centric Networking architecture. It enables data- 
centric security in network communication by mandating digital 
signatures on network layer data packets. It treats data as the 
central element and leverages in network caching, traditional 
security mechanism, tied to data location, can no longer be used. 
NDN embraces a complete data-centric security which ensures 
integrity, provenance, and secrecy of data itself, instead of relying 
on the security of delivery session.I formally and informally prove 
the security in the protocols suggested for device registration and 
data pull under deployment in the application. 

Index Terms—NDN, Architecture, DNS, Security, Trust, HIBC 


I. INTRODUCTION 


As technology and trends are changing day by day, so to 
cope up with these changes Named Data Networking (NDN) 
has been introduced . NDN is a new architecture, which is 
compatible with todays Internet as well as a promising candi- 
date for future internet too.The fast growth of the e-commerce 
websites, Online video streaming sites, smart phones, and 
social medias has made internet a distribution network. Named 
data networking (NDN) is a brand new architecture, but one 
whose design principles are based on the successes of todays 
Internet and which facilitates user choice and competition. It 
has several advantages such as transfer efficiency, security, 
and mobility support.Using NDN makes communication more 
secure and simple. 

NDN changes the semantics of network service from de- 
livering the packet to a given destination address to fetching 
data identied by a given name. NDN is one instance of a more 
general network research direction called information-centric 
networking (ICN), under which dierent architecture designs 
have emerged [1]. The Internet Research Task Force (IRTF) 
established an ICN research working group in 2012.In this 
paper we also analyze the security and the trust in NDN. 
The main enhancements are related to the modification of the 
name structure. This paper is organized as follows:section and 
section analyze respectively the security and trust in NDN. 
Finally, section concludes the paper. 
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Fig. 1. NDN Architecture 


Il. NDN BACKGROUND 


First, NDN uses named data instead of named hosts for the 
communication model. Packets carry data names rather than 
source or destination IP addresses. NDN allows in-network 
caching as a built-in functionality and provide name-based 
routing capability. Secondly, routers are equipped with the 
cache function. Thirdly, NDN replaces the traditional channel- 
based transmission mode with a hop-by-hop one. Security in 
the present Internet is an afterthought, but NDN provides a 
built-in security into data itself. This data-centric security can 
be extended to infrastructure security. 

NDN has shifted the communication paradigm from host 
oriented model into content oriented model. The data-centric 
model is meant to enhance the security protection on the end- 
to-end applications. The key built-in features of NDN include 
in-network caching, sessionless communication, and hop-by- 
hop forwarding. 


HI. NDN ARCHITECTURE 


NDN consist of two types of packets : interest packets and 
data packets. In NDN these packets together form a chunk of 
data and is uniquely named and it provides sufficient security. 
All the chunk of data is digitally signed by the sender and 
verified by the receivers. In NDN the consumer sends Interest 
packets to request data and the corresponding data packets flow 
back along the same path in reverse direction. NDN routes and 
forwards data based on names instead of IP addresses. This 
eliminates address space exhaustion and address management. 

To carry out the Interest and Data packet forwarding func- 
tions, each NDN router maintains three data structures: a 
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Pending Interest Table (PIT), a Forwarding Information Base 
(FIB), and a Content Store (CS), as well as a Forwarding 
Strategy module that determines whether, when and where to 
forward each Interest packet. 

The PIT stores all the Interests that a router has forwarded 
but not satisfied yet. Each PIT entry records the data name 
carried in the Internet, together with its incoming and outgoing 
interface(s). In the absence of a matching PIT entry, the router 
will forward the Interest toward the data producer(s) based 
on information in the FIB as well as the routers adaptive 
Forwarding Strategy. The Content Store is a temporary cache 
of Data packets the router has received. Because an NDN Data 
packet is meaningful independent of where it comes from 
or where it is forwarded, it can be cached to satisfy future 
Interests. 

When a Data packet arrives, an NDN router finds the 
matching PIT entry and forwards the data to all downstream 
interfaces listed in that PIT entry. It then removes that PIT 
entry, and caches the Data in the Content Store. Data packets 
always take the reverse path of Interests, and, in the absence of 
packet losses, one Interest packet results in one Data packet on 
each link, providing ow balance. To fetch large content objects 
that comprise multiple packets, Interests provide a similar role 
in controlling traffic flow as TCP ACKs in todays Internet: 
a fine-grained feedback loop controlled by the consumer of 
the data . controlled by the consumer of the data. Neither 
Interest nor Data packets carry any host or interface addresses; 
routers forward Interest packets toward data producers based 
on the names carried in the packets, and forward Data packets 
to consumers based on the PIT state information set up by 
the Interests at each hop. This Interest/Data packet exchange 
symmetry induces a hop-by-hop control loop (not to be 
confused with symmetric routing), and eliminates the need 
for any notion of source or destination nodes in data delivery, 
unlike in IPs end-to-end packet delivery model. 


IV. NAME SERVICE FOR NDN 


The consumer send interest packets for fetching data pack- 
ets. NDNS designed for satisfies needs NDNS on distributed 
look up services. NDN is an named and secured application 
of data packets and centerpiece network architecture. The 
Internet suffers from the lack of mobility support and security 
integration. NDN is one of the projects proposed to resolve the 
critical Internet problems of complexity, security, sustainability 
management and quenching the thirst for current applications. 
Generally, energy consumption can be divided into three 
main factors; Transmission energy, storage energy and energy 
related to heating, ventilation and air conditioning (HVAC) 
operations. A large number of content replicas increase the 
stored energy in NDN. However, these replicas make the 
information closer to the user, the thing which reduces the 
transmission energy. Some have some lookup services such as 


1) Name Based Forwarding: Exchange of interested packet 
happen in network layer. network layer forward packets 


by name in each NDN router maintain FIB table and it 
contains the logically the table of tuples. 

2) Per-Packet Authentication: NDN mandates data produc- 
ers to create digital signature on their data packets at the 
time of creation. The cryptographic signature binds the 
name of data with its content,so the content authenticity 
can be verified independent from where a data packets 
comes from. 

3) Efficient Data Delivery: Naming and securing data at the 
network layer brings a number of important properties in 
data distribution. 


A. NDN System Architecture 


The NDN architecture is based on the proposed Centric 
Content Networking (CCN).NDN proposes decoupling the 
name from the location, linking names to contents to facilitate 
discovery and retrieval by prex name .Unlike identifying with 
source and destination addresses in IP architecture, NDN uses 
a route-by-name principle. An interest packet is sent by a 
consumer to request a content from the network and the 
producer returns a response to the consumer in the form of 
a data packet. Each router contains three data structures such 
as Content store CS is identical to the IP router buffer, The 
communication in IP architecture is point-to-point, In NDN the 
contents are stored in cache memories called CS . Pending 
Interest Table PIT contains lists of previously transmitted 
packets that are waiting for a response . Forward Information 
Base FIB supports a routing protocol based on the prex name. 
It is similar to the FIB table of the IP protocol. 


e Naming: The NDN architecture is mainly based on the 
Content Name (CN) to facilitate search and retrieval of 
data, which makes the denomination the most important 
part of the NDN design. 


e Routing: In NDN the main mission of routing is to 
determine the topology and policies while following their 
long-term change and thus to update the transmission 
table. 


e Forwarding process: The consumer sends an interest to 
the router to which it is connected. Each time an interest 
packet arrives, a longer search is performed by its CN. 

e Cache decision policy: The concept of caching strategy 
or storage in the network is among the most important 
principles in NDN. The storage in the intermediate nodes 
reduces the data providers burden . The caching strategy 
consists mainly of cache placement and cache replace- 
ment. Cache placement: This is the decision to place 
content on the network or not, Cache replacement: This 
is the decision to cache the content at the router. Least 
Recently Used (LRU) is the most used policy. 

e Security and privacy: Security: Unlike the security in 
TCP/IP that confers security at the ends. NDN secures 
the data itself. Privacy: In IP networks, it is easy to know 
who consumes what content, by exploring the header and 
destination address of the packet [6]. In NDN, given the 
caching strategy and naming, it is possible to know the 
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content, but it is difficult to find out who requested the 
content. 

e Mobility: NDN address the restrictions of IP, users can 
easily move around the network, regardless of the IP 
address changes. 


B. Operational Needs of NDN 


Routing scalability Interested packets are send by using 
names , here FIB size is limited . Mobile publishing Provide 
support for data retrieving but face some challenge when data 
producers move. Certificate provisioning Asynchronous nature 
between data producer and receiver , The time of when data 
retrieve the producer may be in online , consumer need to be 
verify data , public key certificate used for verify. 


C. DNS 


Unique tree like hierarchy name space each nodes in 
tree(DNS) are associated as several type of RR sets. DNS 
name space is managed by DNS zones. This zone host the RR 
sets. Zones are hosted on multiple name servers. DNS depend 
on caching resolvers between end host and end servers. 


V. COMMONALITY AND DIFFERENCE BETWEEN NDN AND 
DNS 


DNS operates at the application layer and NDN builds the 
data fetching semantics directly into the network layer, DNS 
resolver has to explicitly select the name servers to fetch data 
and NDN consumers just request data unaware of where this 
data may be received, Both allows name based data retrieval, 
NDN support caching at network layer, DNS implement an 
application level caching resolvers into the system, DNS 
secures data by attaching each RR set with digital signature, 
NDN on the other hand mandates per packet signature that 
can be validated by consumer [9]. 

e NDNS Design: The design of NDNS inherits several 
basic concepts from DNS, including domains, zones, 
resource records(RR), name servers, caching and sub 
resolvers, and iterative and recursive queries. NDNS 

e NDNS Namespace and Naming: NDNS also uses zone as 
units of administrative management of namespaces differ- 
ent from DNS .NDNS uses a set of naming conventions to 


steer NDNS query interest towards the servers of specific 
zones. 

e Data Formats: NDN retrieves data in the unit of packets. 
Each NDNS data packet contains enough information to 
indicate with which NDN name it is associated. 

e Iterative Resolution: An iterative resolution we will use 
the lookup for a TXT record of the domain such as 
“/net/ndnsim/www”. When the caching resolver receives 
the referral data packet back, the packet should contain 
information on how to reach the NDNS servers of next 
level zone. 

e Recursive Resolution: The NDNS recursive query is 
expressed using interest with names that follows the 
signature, domain name, requested record type. The re- 
sponse for the caching query is the NDNS RR that is 
encapsulated in the data packet with the matching the 
query and signature of the caching resolver. Encapsulated 
data packet contain original signature of the authoritative 
zone. 

e Security: NDNS uses the hierarchical trust model similar 
to DNSSEC . In this model all records in zones are signed 
with one or more data signing keys. DSKs signed by one 
or more key signing keys of the same zone. KSKs are 
either self signed and well known zones such as signed 
by delegation keys records stored in the corresponding 
parent zone. 

e Zone Update and Synchronization: To update NDNS 
records in the zone, one either needs to leverage interests 
to carry data packets with the new/updated records, or 
solicit interest for the updates. And efficiently synchro- 
nize updated records. Such synchronization is a classic 
problem with several promising NDN-based solutions. 


VI. SECURITY INNDN 


In NDN data-centric security model id adopted. This model 
is based on the security requirements [1] [3]: 

e Integrity: no change in data. 

e Name Authenticity: no change in between the name and 

the associated data. 

e Provenance: Data are published by a suitable producer 
able to produce them. This combines the concept of 
producer identification and authentication. 

e Relevance: received data satisfy the expressed request. 

e Confidentiality: data are readable only by authorized 
entities. 

e Access control: reading,writing,and management of data 
are reserved for authorized entities. 

e Availability: data are accessible for the authorized enti- 
ties, according to predefined performance. 

Relevance is explicit since the data name is signicant and it 
is the same in Interest and Data packets. Producer identication 
is ensured if the name contains valid information about his real 
identity. Condentiality can be achieved by encrypting sensitive 
data. For availability, NDN is robust against the majority of 
attacks of the current Internet. Indeed, Data packet must be 
preceded by a request, preventing the sending of unsolicited 
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data. Also, hosts targeting becomes difcult since they can 
no longer be directly addressed . However, NDN remains 
vulnerable to other attacks, such as cache or data poisoning 
and Interest flooding attacks [4] [5]. To ensure data integrity, 
name authenticity and producer authentication, each Data 
packet is digitally signed by the private key of its producer. 
The signature is computed on the entire packet, which ensures 
a binding between data and its name. Its verication requires 
the producer public key. 


VII. TRUST INNDN 


Various trust solutions proposed in NDN are classified into 
two categories: (1) hierarchical, requiring a trust anchor and 
(2) web-of-trust. In the following, we present a representative 
sample for each category as well as a solution, ensuring 
both security and trust. Identity-based cryptography (IBC) 
represents a public-key cryptosystem in which any unique 
string can be used as a public key. The associated private key 
is generated based on the public key and the public parameters 
and secret key of a trusted Private Key Generator (PKG). The 
data name, its prefix or the identity of the producer, can be 
used as an IBC public key. The part selected as the public 
key and the name of the PKG public parameters params are 
indicated in the signature information field of the Data packet. 
Upon receiving this packet, a requester recovers params. He 
combines them with the IBC public key to verify the signature 
and to ensure the producer authentication, data integrity and 
name authenticity. 


A. Trust Solution Implemented in the NDN Testbed 


A hierarchical trust solution was proposed and implemented 
in the NDN testbed. In this solution, a key pair (pkroot; skroot) 
is associated with the root of this testbed. pkroot represents 
the trust anchor; It is known and trusted by all entities. The 
corresponding private key skroot is used to sign Data packets 
containing the sites public keys. The operators of these sites 
sign, in turn, their users keys. Each user then uses his own key 
to sign the keys of his equipments and applications. the root 
of the testbed is the root Certification Authority (CA), and the 
sites are the intermediate ones. All generated public keys are 
published under the same prefix /ndn/keys.Trust in a public 
key, required in the signature verification of a particular Data 
packet, is determined by performing the following steps: 


i) Checking the field keyLocator in this packet, which 
must match a public key name. 

ii) Verification if the name of the key (KeyName) is 
the one of the root key or one of the keys verified 
previously. If there is a match, the packet signature is 
verified. 

iii) Otherwise, request the key based on the field Key- 
Name. Upon receiving the Data packet containing 
this key, verification of its signature, by recursively 
applying steps 1, 2 and 3. 


B. Trust Solution Proposed for Chronochat 


A chat application named ChronoChat was proposed in 
[6].It adopts a trust model based on the concept of web-of- 
trust. This model establishes trust in users, identified by their 
public keys, based on endorsements.These endorsements are 
provided by other users and signed by their private keys. For 
a particular namespace N, a user A signs an endorsement for 
another user B , after the validation of his public key.This 
endorsement is then published in a Data packet. It asserts that 
A trusts B to publish data under the namespace N. 

Three trust levels for users are defined: 

e Level 1: a user is trusted for the publication of Data 

packets, under a particular namespace. 

e Level 2: in addition to the trust level 1, a user is trusted to 
endorse the publication of Data packets to another user, 
under a particular namespace. 

e Level 3: in addition to trust level 2, a user is trusted to 
delegate endorsement authority to another user, under a 
particular namespace. An indirect relationship of trust can 
thus be constructed. 

In ChronoChat, by receiving a Data packet, a requester 
consults the field KeyLocator of this packet and retrieve 
the public key name needed in signature verification. This 
name refers to the identity of the packet producer. Based on 
all the previously collected endorsements, the requester can 
decide if the producer is trustworthy. He can then retrieve the 
corresponding public key to verify the signature. 


VIII. SECURITY EXTENSION 


In NDN, structure and semantics of the names have a 
deep impact on security. Indeed, their meaning ensures the 
relevance and their exibility ensures the producer identication. 
Data integrity, name authenticity and producer authentication 
are based on the verication of the digital signature using 
the corresponding public key. To propose this verification, 
establish a link between the name and this key . We propose 
a security extension which is essentially based on HIBC. 


A. Hierarchical Identity Based Cryptography 


Hierarchical Identity-Based Cryptography (HIBC) is a vari- 
ation of IBC that reflects an organizational hierarchy [2]. 
Indeed, this cryptosystem does not have a single PKG that 
owns the master key and has to deliver private keys for all 
users. A root PKG is only required to produce private keys 
for domain-level PKGs, and it delegates private key generation 
and identity authentication to lower-level PKGs. 


B. Proposed Naming System 


This is the integration of name with the HIBC. The root 
PKG (level 0) generates the public parameters params as well 
as the private key associated with /Supcom.tn. 

The level 1 PKG belongs to SupCom and a level 2 PKG 
is attached to the administrative agent identied by the unique 
number 274. ‘content ID’ identies a specic data emitted by this 
producer. Finally, ‘validity date’ indicates the validity period 
of the PKG public parameters. 
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Version and 
segment 
/Supcom.tn/admin_274/academic_calendar_2016_2017/1_1_2018/2/1 
Producer ID 


Routable name Organizational name 
r 


Content ID Validity date 


Fig. 3. Name structure in NDN with the integration of HIBC 


The proposed security extension better meets the security 
requirements in NDN. it ensures both producer identication 
and authentication, data integrity, name authenticity and rel- 
evance. For example in Fig 3, the use of name structure 
as an HIBC public key allows, with the public parameters 
of the root PKG, the signature verication. This guarantees 
producer authentication, data integrity and name authenticity. 
The producer ID part includes information about producer 
identity, which ensures its identication. Finally, the content 
ID part contains expressive information about the academic 
calendar, whether data with such a name are the desired 
ones. To ensure condentiality, a producer can encrypt sensitive 
data with a random symmetric key. It can then encrypt this key 
using the requester identity as a HIBC key. In the proposed 
extension, a different HIBC key is associated with each Data 
packet. This key is derived from the data name and the 
PKG public parameters params. Sometimes the same public 
parameters are used by all the HIBC system and the number 
of these systems is much lower than that of users, the problem 
becomes less heavy. 


IX. CONCLUSION 


NDN is an information-centric Internet architecture, which 
offers better support for content distribution applications such 
as video playback application and video streaming application. 
The realization of NDN faces a number of challenges. One 
challenging issue is being able to evolve existing applica- 
tions. Another one is replacing IP addresses with names is 
timeconsuming and memory-consuming. For NDN to become 
successful and widely deployed, it must have a robust and 
efficient congestion control mechanism. 


In this paper we described the design of NDNS which 
provides an always on lookup service in an NDN network 
By providing a service to store and lookup forwarding hints, 
and NDNS provide facilities for retrieving named data. 

In this paper, we addressed the security and trust aspects of 
NDN architecture. We analyzed its data-centric security model 
and the different proposed trust models. We proposed then a 
security and trust extension.The data name acts as an HIBC 
public key. The changes brought by our security extension 
ensure producer identication and authentication, data integrity, 
name authenticity and relevance. It proved that on the requester 
side, NDN with our security extension, provides comparable 
performance, even better in some cases, than plain NDN. 

Some applications achieves benefits from NDN charac- 
terstics, NDN has good advantages in educational services 
„Multimedia applications ,Commercial applications , Vehicular 


communication networks. As the next step, we plan to con- 
tinue our investigation on the named data network(NDN). 
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Abstract—Continuous integration is a software engineering 
practices, in which isolated changes immediately tested and added 
to code base. Continuous Integration (CI) is very useful for 
applications that involve many files and multiple developers. 
In this paper we discuss about continuous integration by a 
literature review on build waiting time, some applications as 
follows: Modern Client-Side Web Application, Reducing Web 
Applications Development Risks, implementations as follows: 
Code-Churn Based Test Selection and Workow for a Power 
System Modelica Library, and understanding DevOps concept. 

Index Terms—Security Issues, Challenges and Solutions for 
E-Commerce Applications over Web 


I. INTRODUCTION 


The literature review is to understand the effects of build 
waiting time in software engineering and to get input from 
waiting time research in other disciplines. Two literature 
reviews, rst on build waiting time and second on waiting times 
in the contexts of service operation, web use and computer use 
are leads to the conclusion. Two minute build waiting time 
was considered optimal, but under 10 minutes was considered 
acceptable. 

Continuous Integration (CI) is very useful for applications 
that involve many files and multiple developers. Unfortu- 
nately, not all types of applications can easily apply this 
approach. Apparently, CI does not gain a lot of attention with 
Modern Client-Side Web Application (MCSWA) because it 
requires complicated testing, i.e. the running environments 
are browsers. There is no compiler or error warning when a 
developer writes bad code and the build behavior in the usual 
CI practice is different from the build process in MCSWA. 

Web applications (Web app) development process must 
overcome the challenges of environmental changes and many 
kind of services requests. For effectively reducing development 
risks, Web app must decrease the impact of all of changes[8]. 
Many events, which include error correction, requirement 
specification revision, environment evolution, and resource 
adjustment often cause to project plan changes. Web app 
development must enhance the defects identification and quick 
revision mechanism to reduce risks of environmental changes 
and modification requests. 

DevOps is set of practices which is help to resolve the 
problem of developer operations gap in the software delivery. 
But at the same time is not limited to this development and 
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operations and which help in speedily, optimized and high 
quality software delivery. DevOps is an set of principles for 
software delivery and it also focus on speed of delivery, con- 
tinuous testing in production like environment, be in shippable 
state at any day, continuous feedback, ability to react to change 
more quickly, team working to accomplish a goal instead of 
a task (there is no delay). DevOps extends agile principles to 
entire software delivery pipeline[13]. 


Continuous integration is a software engineering practice, in 
which isolated changes immediately tested and added to code 
base. Code churn-measure for estimating the impact of code 
changes. Here we discuss applications and problem avoiding 
DevOps. Continuous integration promises advantages in large- 
scale software development by enabling software development 
organizations to deliver new functions faster. However, im- 
plementing continuous integration in large software develop- 
ment organizations is challenging because of organizational, 
social and technical reasons. The method is based on analysis 
of correlations between test-case failures and source code 
changes and is evaluated by combining semi-structured iration 
i anterviews and workshops with practitioners at Ericsson 
and Axis Communication in Sweden.The results show that 
using measures of precision and recall, the test cases can be 
prioritized.The prioritization leads to finding an optimal test 
suite to execute before the integration. 


Traditional simulation tools for power system studies are, 
in general, shipped with built-in and closed model libraries. 
Typically, the models implementation is not thoroughly doc- 
umented, preventing the user to gain a full understanding of 
their implemented behavior. Previous efforts from the authors 
have focused on the development of an open source software 
library of power system components developed using Mod- 
elica: the Open-Instance Power System Library (OpenIPSL), 
which provides models that can easily be accessed and studied 
by the user. Employing the latest technologies available in 
the software development community, this paper details the 
implementation of a continuous integration workow, providing 
automated testing and behavior verication of the librarys 
models. This platform seeks to increase the librarys stability 
and to provide more reliable models developed collaboratively 
by multiple individuals. 
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II. BUILD WAITING TIME IN CONTINUOUS INTEGRATION 


This section leads that the build waiting time is varies 
according to the context. Here specifies that the build waiting 
time should be less than 10 minutes. Two literature reviews 
are conducted. First one is build waiting times are studied 
in the context of CI. Two digital databases were used for our 
initial searches. Scopus and Google Scholar was used to search 
the body text of the articles. These two yielded three relevant 
articles. According to Brooks[1] concepts the four variables 
affects build time are commit size and frequency, build down 
time, development ow and developer satisfaction. Rogers[2] 
build waiting time affects commit size and frequency, in- 
tegration effort and development ow. Rasmusson[?] can be 
described as feedback delay, integration effort, development 
ow and team morale. 

Second, three relevant waiting time contexts were identied: 
service operation, web use and computer use. In service 
operation, Baker and Cameron[3] did not propose any optimal 
waiting times. Instead, they encouraged minimizing the per- 
ception of waiting time instead of the actual waiting time. In 
web use, Nahs[4] own study suggest that the task given to the 
web user can affect the tolerable waiting time and that giving 
feedback increases the tolerable waiting time. In computer use, 
Dabrowski and Munson [5] summarized the studies regarding 
computer system response time. 

Seven effects of build waiting time were found in the 
literature and they are categorized into CI specic, cognitive and 
emotional effects[6]. While the sources state that build waiting 
time of 210 minutes is considered optimal. The waiting time is 
related to user satisfaction, which confirms in service operation 
context. In web use context, which conrms that waiting time is 
related to memory, ow and satisfaction. Computer use studies 
state that too long waiting times cause anger, frustration and 
annoyance, due to long build times it may decreased developer 
satisfaction. 


II. APPLICATIONS 


A. Continuous Integration Processes for Modern Client-Side 
Web Applications 


Project deadlines may be effect by unexpected and un- 
foreseeable problems. Continuous Integration (CI) is a very 
promising approach to improve this situation and enhance 
development quality and efficiency. Some managing processes 
are required to work automatically for reduce the number 
of defects during the integration process. This paper will 
illustrate a solution. Our proposed CI process that can be 
perfectly adapted with web development environments. Here 
also introduce a way to allow tests be run on a Cloud Services 
in order to minimize time and cost compared to an expensive 
local installations of many different test CI is just a basic 
concept which still needs to be customized in order to systems. 
Our main objective is to convert from manual integration to 
automated integration. Common smoothly adapt with modern 
client-side web applications testing systems. 


1) Development and Manual Testing: This process starts 
with developers checking out the project from VCS, so that 
they can work on the assigned tasks. The specific tasks of 
developers consist of maintaining source code and tests . 
Developers also perform manual tests on their local machine 
in order to make sure that all tests are passed and the source 
code work correctly. 


2) Automated Process on a CI Server : CI server is 
responsible for the four mandatory automated processes[7]. 


e Test before Build : The aim of Test before Build is 
to catch the errors as soon as a new commit is made. 
Fortunately, with modern web applications we do not 
need to compile our code before running it. 

e Build : The build process is performed automatically on 
the CI server, after a successful Test before Build. Then 
after the build process finishes, the CI program finds 
the error in log file. If there is any error, the program 
will send an email to the project manager with log file. 
Otherwise, the CI program ends the build process and the 
CI server will wait for the next trigger event. 

e Test after Build : The Test after Build process is worked 
with external platform. It means all tests will be run on a 
cloudbased service. The cloud-based service allow us to 
perform the tests on a number of different environments, 
i.e. Operating systems, browsers, system resources, and 
etc., which are not existed on our local. This is very 
helpful because performing tests on a local machine is 
very costly to include many environments. 

e Deploy : Deploying a web application on a test server 
ensures that the application can be manually tested. 
Since the programmers deploy multiple times during the 
development process, it is preferred to automate this job. 
As a consequence, the deploying process will be started 
once the Test after Build process finishes without any 
defects. 


3) Implementation: The whole implementation is divided 
into five sub tasks. Firstly, we developed a local testing 
environment which automates local browsers and runs on 
our CI server. Secondly, we developed a build script for the 
Build process . Thirdly, we defined an automated cloud testing 
process. Fourthly, we developed a deployment script for the 
Deploy process. Lastly, we created script to ensure the whole 
process will run successfully and perform error handling tasks. 


4) Outcomes: 


e Errors could be detected from the early stage. 

e It improved working collaboration within the develop- 
ment team. 

e Developers have more confidence in their code. 

e Quality Assurance (QA) is improved accordingly. 

e With this system we could replace a lot of manual tasks, 
including verbal communication between developers and 


QA. 
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B. Applying Continuous Integration for Reducing Web Appli- 
cations Development Risks 


1) Agile Development: Most of software development mod- 
els have high relationship with the user requirements. There 
is no chance for the quality of requirement documents to be 
reach correctness, completeness and consistency. The proposed 
development models have modified requirement specification 
style. By iterative development model, user can provide the re- 
quirement items and software development risks can be greatly 
reduced. Each development model should has the adjustment 
strategy to handle the change of user requirements[ 12]. 

In time management side, agile software development ap- 
plies time-boxing approach to control process schedule[12]. 
Requested software project must release a new version in 
two or three weeks. In each day, a fifteen minutes stand up 
meeting is fixedly held to effectively reach the fully communi- 
cation between client and developers. In development change 
impacts, agile development greatly decreases schedule delay, 
cost excess budget and quality unsatisfied user requirement 
situations, software development change risk can be effectively 
reduced[ 12]. 

2) Advantages of Continuous Integration: The main goal of 
CI is to identify the software errors and defects quickly, and 
to make the errors and defects can be corrected and revised 
as soon as possible. 


e High efficiency. 

e Reduce development risks. 
e Smooth communication. 

e Improvement quality. 

e Increased morale. 


3) Characteristics of Web app: In Web app development 
process, it is necessary to face challenge of many changes. 
Updating, requirement specifications, environment evolution, 
and resource adjustment etc. 


e Friendly user interface. 

e Bear a wide range of users. 

e Rapidly accomplish change service requests. 
e Timely add new service items. 

e Quick migration to new platform. 

e High integration ability. 

4) Web app Development Risks: 


e Unit testing risk: Incomplete unit testing always causes 
errors and defects. 

e System integration risk: Lacking automated tools and 
procedures, integration process often be delayed or not 
passed that causes errors and defects flow to the follow 
development phases. 

e Communication risk: communication barriers between 
the user and developer will decrease requirement doc- 
uments cant be reach correctness, completeness and con- 
sistency. 

e Quality and efficiency risk: Interface mistakes always 
cause system errors and defects which deeply impact the 
quality of software application. 


5) Major components of CI environment: 


e Automated testing tools: After coding, unit program 
needs proceed complete unit testing by the programmer. 
For improve unit testing quality and productivity, CI 
environment should plan the suitable and automated unit 
testing tools[12]. 

e CM (Configure Management) procedure and version con- 
trol: In software system have various versions, version 
control is necessary tool to recovery the previous versions 
and identify the version difference. 

e Source repository: The development documents, source 
code and related documents should save to source repos- 
itory. 

e Automated integration: According to the integration 
procedure, at first, module integration testing need 
be executed and pass verification, and then subsys- 
tem integration testing need be accomplished and pass 
verification[12]. Finally, all subsystems are integrated into 
a system. 

e Automated deployment: Automated deployment can in- 
crease software development efficiency and identify new 
workable software errors and defects timely. 

e Errors and defects detector: identify the software system 
errors and defects as early as possible, the revision work 
can be simplified, modification effort can be reduced. 


6) Web app Continuous Integration procedure: CI environ- 
ment combines useful automated tools and necessary develop- 
ment procedures for increasing software development quality 
and productivity. 


e WACIP divides into five phases that describe as follows: 


— Unit testing phase: It can assist automated unit 
testing. 

— CM phase: Passed unit testing requests, unit pro- 
grams have to check-in source repository to manage 
by CM system. 

— Integration and deployment phase: Based on the 
version date time of source code, object code and 
image file, building definition file can automated 
recompile and rebuild the target products. 

— Verification and validation: Based on the workable 
software, the user can attend and assist software 
verification and validation. 

— Continuous improvement phase: This phase is still 
ongoing CI operations and the number of times 
integrating more frequently 


e Based on advantages of WACIP, evaluating the improve- 
ment results of Web app development risks which de- 
scribe as follows: 


— Reduce unit testing risk: The test cases and unit 
test tools can assist automated unit testing[12]. Unit 
programs, which passed the unit testing criteria, can 
enter CM system and save to source repository for 
supporting follow phases operations. Otherwise, the 
errors of program must be identified and debugged, 
until unit testing result can pass testing criteria. 


Melna Davis et al., “Continuous Integration” 


106 


Proceedings of the Vidya Computer Applications Departmental Seminar (VCADS - 2018), 4 - 5 April 2018 
Department of Computer Applications, Vidya Academy of Science & Technology, Thrissur — 680501 


— Reduce system integration risk: traditional software 
development model handles integration testing only 
in test phase and test a limited number of times. In 
agile development, CI with WACIP adopts automated 
tools which can timely handle unit testing, integra- 
tion testing and deployment tasks[12]. 

— Improve communication risk: communication barri- 
ers between the user and developer are the key factor 
to impact the success. CI with WACIP uses version 
control tool. CI increases communication frequency, 
dramatically reducing communication barriers and 
software development risks. 

— Reduce quality and productivity risk: CI environment 
combines many automated tools can increase soft- 
ware quality. CI with WACIP not only can improve 
development efficiency but also can increase soft- 
ware quality[12]. 


IV. UNDERSTANDING DEVOPS AND BRIDGING THE GAP 
FROM CONTINUOUS DELIVERY 


In the enterprises deliver of software is going through a 
wave of change as the environment. The market needs are 
changing and continuous delivery of software and market 
needs quick delivery. Customers expect continuous engage- 
ment so that they can provide continuous feedback[13]. How- 
ever it focus mainly software development in operations side 
of software delivery lagging. As a result software development 
teams are able to the faster delivery and operations teams 
can absorb the builds. In the case in software delivery the 
optimizations are do in software delivery cycle if one phase 
delivery is delayed then entire delivery will be delayed. 


A. DevOps Applied To Various Phases Of Software Delivery 


1) Continuous Planning: Businesses plans have to be ag- 
ile that is able to adjust quickly to the changing market 
conditions[13]. And it also modify or adjust the plans as 
needed based on market feedback. This is difficult for test 
teams to adapt to the quick changes in business environments. 
DevOps allows to do that having a prioritized product backlog, 
continuous channel of feedback with customers and ability 
to prioritize the product backlog all the time, directly taking 
business angle in consideration. 

2) Continuous Integration: Continuous integration basi- 
cally refers to integrate early, dont keep changes in your 
workspace only, share your changes with team and validate 
how code behaves continuously. Further this stage of process 
optimization refers to achieving automation such that as soon 
as developer delivers the change[13]. This has to be a repeat- 
able continuous process all across the development cycle. 

3) Continuous Deployment: This is heart of DevOps and 
forms the critical piece of overall software delivery opti- 
mization. Surveys have shown that in majority of organiza- 
tions have a problem the software delivery delay in software 
delivery[13]. DevOps principles recommend to automate the 
deployment and provisioning of hardware and various cloud 
providers play a crucial role in this field. 


4) Continuous Testing: To be repeated over time it should 
get automated, there are enough technologies available to 
meet that goal. Manual testing process must be evaluated 
for possibilities of automation and in majority of cases there 
will be ways to automate the same. This whole principle of 
continuous testing not only moves the testing process too early 
in cycle but also allows the tests to be carried out on production 
like system[13]. 

5) Continuous Monitoring: It have the capability to test 
early and on a production like system there is an opportunity 
to observe various quality parameters throughout and hence 
ability to react to any surprises in timely manner. 


B. Software Delivery Pipeline With DevOps 


The various DevOps approaches are used in various 
pipeline. It help to compare to a delivery pipeline of a 
manufacturing unit. That will help in quick and consistent 
releases. The various DevOps approaches are used in various 
pipeline. 


C. Bridging The Gap From Continuous Integration To Con- 
tinuous Delivery 


Continuous Integration processes have been here for a while 
now and as part of transformation over last decade most of the 
project teams are having processes in place which align with 
continuous integration[13]. 

Continuous Delivery tries to optimize the infrastructure 
management and the critical need to balance out time and 
resources[13]. Validating the software products involves mul- 
tiple variable inputs like product versions, different operating 
systems, different third party software versions etc. It is neither 
humanly possible to test all these combinations nor practical. 

1) How to Bridge the Gap: Case Study: There are two 
key pieces to address the business problem one being de- 
ployment automation tools (e.g. IBM uDeploy) and second 
being cloud based resource provider (e.g. IBM Smart Cloud 
Orchestrator)[9], [10] . Project teams can identify the various 
topologies needed to be tested and create corresponding de- 
ployment pattern[11] . Not only create deployment patterns but 
also automate the steps needed thereafter like installation of 
application stack on the cloud provisioned image, populating 
the test data on this image, triggering the automated test suite 
and pushing the results to a central repository. Team must 
carefully choose the automation language here (Chef/ Puppet 
/ Shell Script / Perl / Python etc.) keeping in mind the platform 
coverage needed. Once this automation is built in getting to 
any test configuration is just pick and choose the input variable 
(version, os etc.) and click Go and you are done. Overall, 
developers are able to perform the defect validations much 
more quickly without having to wait to manually configure 
the hardware with latest software bundles having their fix in it. 
Last but not the least once done, simply release the computing 
resources so that can be used by anyone else. 

2) Benefits: 

e Time Saving. 

e No need for dedicated hardware. 
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e Reliable and scalable infrastructure provisioning through 
TAAC. 

e Deployment becomes a consistent repeatable process. 

e Continuous deployment model enables developers to get 
access to production like systems and thereby doing 
validation in an environment similar to production. 

e Significant cost saving as teams can share the pool of 
resources. 


D. Applicability 


Fundamental problems that DevOps approach tries to ad- 
dress are adaptability to change, speed to market and main- 
taining high quality with low cost which are universal business 
problems in any type of software project. DevOps principles 
are generically applicable to software delivery and are not tied 
to any specific type of product or services. They can be applied 
to enterprise level complex product development or to a small 
web application or even to a mobile app development[13] . 


V. SUPPORTING CONTINUOUS INTEGRATION BY 
CODE-CHURN BASED TEST SELECTION 


Software development organizations need to meet the de- 
mands of rapidly changing requirements and to release new 
products and features more often and much faster. A typi- 
cal initiative to meet these demands has been the transition 
from traditional waterfall software development environment 
towards agile practices, which embraces change and empha- 
sizecustomer collaboration. 


A. Related Work 


We have identified the related work in three areas test 
selection and test prioritization, continuous integration and 
visualization of large quantities of data for decision support. 

1) Test Selection and Prioritization: Test case selection 
has been studied from a number of perspectives. One of the 
perspectives is the increase of test coverage by automated 
means. 

2) Continuous Integration: A Continuous Integration Visu- 
alization Technique (CIViT)[14] that provides an overview of 
end-to-end testing activities. 

3) Metrics Visualization: Visualization of source code 
changes using heatmaps was context of daily follow up of 
project progress. 


B. CCTS: Code-churn Based Test Selection Method 


The CCTS method comprises of two parts 

e Historical analysis of code churns and test execution 

results. 

e Finding optimal test suite using precision and recall 

metrics. 

1) Historical Analysis of Code Churns and Test Execution 
Results: The historical analysis takes two inputs the list of 
source code changes and the results of test case execution .The 
method creates a contingency table which shows how often a 
test case fails if there is a source code change in that particular 
day. 


2) Finding Optimal Test Suite: The automatic recommender 
of optimal test suite takes as input . 


e The contingency table 
e List of recently changed modules 


In order to find the optimal test suite we use the information 
retrieval measures recall, precision and the f-measure. These 
measures are based on four categories of errors: 


e True positives 

e False positives 
e True negatives 
e False negatives 


C. Case Study Design 


We evaluate the CCTS method at two companies Ericsson 
and Axis Communications. We perform the evaluation by 
collecting the historical data and which test cases should 
selected in order to optimize the effectiveness of the test suite. 
Case study consisting of two cases. The data is collected using 
the following methods: 


1) Document analysis 

2) Semi-structured interviews 

3) Focus group discussions iteratively with key representa- 
tives from development and management 

4) Group discussions with testing experts 


D. Collaborating Companies 


1) System level testing at Ericsson 
2) System level testing at Axis Communications 


E. Result 


In this section we show the statistical models from both 
collaborating companies visualized as heatmaps and we sum- 
marize the results from the interviews and group discussions. 


1) Identifying efficient tests: statistical models 
2) Simulating test selection using CCTS 
3) Scenarios for Applying CCTS 


VI. IMPLEMENTATION OF A CONTINUOUS INTEGRATION 
WORKFLOW FOR A POWER SYSTEM MODELICA LIBRARY. 


Todays power systems are expanding, as grids of different 
areas are interconnected to form larger geographically-spread 
systems. The system complexity also increases as more re - 
newable energy sources and power electronic-based devices 
are connected to the grid. This sets higher requirements on 
power systems dynamic studies using computer simulations. 


A. Drawbacks of Conventional Power System Simulation Tools 


The users of power system simulation tools have widely 
accepted and used a few proprietary tools for phasor time- 
domain simulations . These tools ship with pre-compiled 
model libraries, and thus, much of the information related 
to their actual implementation is inaccessible. The modeling 
philosophy of these tools is rarely questioned, and often 
overlooked when analyzing simulation results. 
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B. OpenIPSL: Modelica for Power Systems 


A power system component library was developed ,the 
iTesla Power System Library (iPSL).[15], [16] This library 
was developed using Modelica , by the authors into the 
Open- Instance Power System Library (OpenIPSL).Modelica 
was adopted for these libraries, because it is an open-source, 
object-oriented, multi-domain, and standard- ized modeling 
language. 


C. Software Development Practices 


Software development methods when compared to those of 
the power system community. These improvements seek to 
ensure code changes, code quality and continuous shipment 
of revisions to the users. 


D. Paper Contributions and Organization 


This paper presents the methodology, software architecture, 
and prototype implementation that are used in the proposed 
continuous integration workflow in OpenIPSL. 


E. Development Workflow and Identified Issues 


The development of the iPSL began with the iTesla project. 
At the time, the development group had very little knowledge 
about Modelica, it grew organically, and there was no well- 
defined collaboration strategy. 

e Version control system 

e Feature-branch 

e SW-to-SW validation 


F. Continuous Integration Solution 


The facilities that perform automated checking and vali- 
dation code tests are referred to as continuous integration 
services. 


G. Testing and Validation Methodology 


The procedure for testing and validating the models in the 
OpenIPSL is comprised by the following . 


1) Test system 

2) Reference trace 

3) Model check 

4) Model validation (behavior verification) 


H. Architecture 


The goal was to setup an automated system. To achieve 
the desired functionality, several free open source tools and 
services were combined. 


e Technologies :A brief description and context for each 
employed technology are given below.Git is the dis- 
tributed version control system used for the de- velopment 
of the OpenIPSL. It allows to keep track of changes 
made to the source code and provides the facility to 
merge several development branches together. GitHub 
is a hosting platform for git repositories used for the 
OpenIPSL project. It facilitates collaborative develop- 
ments .Moreover, it provides additional facilities for doc- 
umentation and issue tracking also accessible to the entire 
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Fig. 1. Architecture of proposed continuous integration methodology 


Github community. Travis CI is a web-based continuous 
integration (CI) platform integrated with GitHub. It can 
be configured to trigger testing routines every time a 
developer pushes to the repository, or creates a pull- 
request to the master branch. Docker is an open-source 
program that lets users package an application and its 
dependencies into a standardized unit (i.e. container). 

e Workflow: It was decided to build a CI service that 
would trigger for every commit and pull-request sent 
to the master branch of the OpenIPSL repository. The 
automated process depicted on Figure is described as 
follows: 


i) A pull-request is created on the OpenIPSL repository, 
developer pushes commits. 

ii) Upon start, submitted code from the source code 
repository. 

iii) Jenkins continuous integration service takeplace.New 
metacloud instance created( Travis CI platform) also 
pulls the image from Docker Hub and the reference 
traces to be used later for validation purposes from a 
local FTP server. 

iv) Within the Docker container, Python scripts will start 
OpenModelica, execute the model checks, and carry 
out the SW-to-SW validation procedure. 

v) In the case that a model does not pass the tests, the 
Python script and Docker container will exit with the 
flag 1, meaning that the test failed. Vice versa, they 
exit with the flag O when all of the models pass the 
tests. 

vi) Test results are reported back to GitHub and, depend- 
ing on the test result, the pull request will be allowed 
or blocked. Travis CI also preserves a snapshot of 
the process, which can be used to diagnose potential 
failures. 


I. Illustrative Example 


In this Section, errors will be intentionally introduced , 
automatically detecting errors in the code submitted. 


e Test Model and Testing Case: The errors will be intro- 
duced in the Modelica code of the IEEEX1 excitation 
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system. This component was originally implemented in 
the IEEE standard. 

e Synthetic Testing: A set of syntax error is introduced 
in the form of parameter naming. Synthetic Testing 
Results: The errors described in the previous Section and 
are checked.Synthetic Testing:-A set of syntax error is 
introduced in the form of parameter naming. 

e Synthetic Testing Results: The errors described in the 
previous Section and are checked. 


VII. CONCLUSION 


Based ndings from other disciplines we suggest that CI 
waiting time should be modied by giving feedback, controlling 
perceived waiting time and having different waiting times 
for different tasks. From CI research it was found that build 
waiting time has CI specic, cognitive and emotional effects. 
Optimal waiting time of 210 minutes was provided, we found 
that there is a lack of empirical research considering this. Other 
waiting time research conrms and elaborates the negative 
cognitive and emotional effects presented in CI research. 

From our proposed conceptual processes to actual imple- 
mentation, it is shown that applying a CI approach with 
MCSWA is not complicated. It can be done by following 
our processes step by step individually. Then combining each 
step and setting up the condition on how you want each 
process connect to each other. Therefore, CI could work 
efficiently when developers commit great code and tests. So, 
it can help the automation processes detect unpredictable and 
unforeseeable errors. Finally, our experiment shows that the 
CI approach could also work with MCWA. 

Generally, traditional development models correct program 
errors and modify software defects in the testing phase. CI 
with WACIP specially concerns the errors and defects iden- 
tification in early. Errors and defects can be timely corrected 
or removed, Web app development risks can be efficiently re- 
duced. Web app must quickly overcome environment change, 
and have the ability to continuously adjust and modify for 
meeting the diversified needs of the users. Therefore, how to 
reduce the development risks of Web app. 

DevOps just defines the set of principles but how and using 
what technology organizations adopt that approach or principle 
is completely to be evaluated and decided by the organization. 
Even within a single organization different teams might have 
need to adopt different technology or tools to adopt DevOps 
approaches which is absolutely fine, whole purpose being to 
continuously optimize and transform. 

Continuous integration requires new ways of selecting test 
cases and requires more automation in that area, but this 
automation provides also new possibilities. One of such pos- 
sibilities is automated collection of statistics of test case 
executions and collection of code churn statistics. In this paper 
we explored the possibilities for using the new data as input 
to an automated test selection algorithm. The initial evaluation 
of the method at Ericsson and Axis Communications showed 
that the approach is feasible and can be used in practice in 
two scenarios. The first scenario is the restructuring of the test 


suites and the other redistribution of the test scope levels.In 
our future work we intend to develop a tool for orchestrating 
test cases to further refine our method and provide a smarter 
test infrastructure. 

The increasing size of power systems and complexity of 
their component models require new modeling and simula- 
tion technologies. The methodology and software architecture 
for continuous power system model development presented 
in this paper is an attempt to bridge the gap in software 
development practices found in software engineering and those 
commonly used for power system modeling and simulation. 
The systematic ap proach for model checking, verication and 
validation will not only contribute to eradicating common 
development mistakes, but also allow to engage the community 
in contributing to the development of the OpenIPSL library 
by facilitating the integration of others work. In the future, 
the development and testing of Modelica models may become 
of great importance because it could affect simulations of 
power systems ,regional and even continental level, especially 
now that Modelica has been adopted as the language to be 
used in the denition of dynamic models.A. Future Work:- The 
proposed architecture is a prototype implementation yet to be 
fully deployed on the OpenIPSL. Upon deployment, other 
functionalities will be considered, e.g. automated test case 
generation, new validation metrics, etc. The conguration les 
for the testing infrastructure are stored in the same repository, 
inviting further developments from the community. 
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Abstract—Security becomes a critical issue to research com- 
munity in the field of information security. Several solutions 
and standards have been introduced according to the recent 
security requirements in order to enhance the security. This 
paper will survey various security issues and its solutions in 
WMNs (Wireless Mesh Network), Wireless network, email, E- 
commerce, Internet of Things(IoT), Data storage. Paper discusses 
a wide variety of attacks in WMN, Wireless network and their 
classification mechanisms and different securities available to 
handle them including the challenges faced. Then the paper 
concentrated on existing enhancements focus on keeping the 
exchange of data via e-mail in confident and integral way. While 
the others focus on authenticating the sender and prove that he 
will not repudiate from his message. Also E-commerce Security 
is a part of the Information Security framework and is applied 
to the components that affect e-commerce that include Computer 
Security, Data security. Then we have surveyed all the security 
flaws existing in the Internet of Things (IoT) that may prove to be 
very detrimental in the development and implementation of IoT 
in the different fields. we saw various solutions on how to ensure 
that data stored in the cloud is not maligned or corrupted by 
the service providers or other attack agents using various types 
of challenge response schemes in order to occasionally test the 
service provider for quality of data provided and ensuring data 
is correct. We introduce different models and techniques used to 
solve and enhance the security and evaluate each area from the 
view point of security. 

Index Terms—Wireless Sensor Network, Security, Privacy, 
Cloud Storage, Secure Routing, RFID, DDoS Attack 


I. INTRODUCTION 


Security, in information technology (IT), is the defense of 
digital information and IT assets against internal and exter- 
nal, malicious and accidental threats. This defense includes 
detection, prevention and response to threats through the use 
of security policies, software tools and IT services.Security is 
critical for enterprises and organizations of all sizes and in 
all industries. nformation security, also called infosec, encom- 
passes a broad set of strategies for managing the process, tools 
and policies that aim to prevent, detect and respond to threats 
to both digital and nondigital information assets. 

Wireless mobile communication have grown dramatically 
in last decades. Due to increased usage of wireless mobile 
networks and communications in our everyday life, the society 
has become extremely exposed to cyber security attacks and 
threats in this environment. In order to provide a basis for 
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further in detail discussion on attacks and security mechanisms 
in this challenging environment, this paper provides a brief 
overview of security issues, i.e., attacks, vulnerabilities, and 
threats in wireless networks. Based on the review, it may be 
concluded that future steps in this domain should include in 
detail investigation of security issues in all addressed networks 
and their categorization in a unified manner, either by the place 
they occur, network level, type of damage they cause, security 
breach level, etc., accompanied by the description of proposed 
classes IoT security is the area of endeavor concerned with 
safeguarding connected devices and networks in the Internet 
of things (IoT). 

The Internet of Things involves the increasing prevalence 
of objects and entities known, in this context as things pro- 
vided with unique identifiers and the ability to automatically 
transfer data over a network. Much of the increase in IoT 
communication comes from computing devices and embed- 
ded sensor systems used in industrial machine-to-machine 
(M2M) communication, smart energy grids, home and building 
automation, vehicle to vehicle communication and wearable 
computing devices.The main problem is that because the idea 
of networking appliances and other objects is relatively new, 
security has not always been considered in product design. 
IoT products are often sold with old and unpatched embedded 
operating systems and software. Furthermore, purchasers often 
fail to change the default passwords on smart devices or if they 
do change them, fail to select sufficiently strong passwords. 
To improve security, an IoT device that needs to be directly 
accessible over the Internet, should be segmented into its own 
network and have network access restricted. 

The network segment should then be monitored to identify 
potential anomalous traffic, and action should be taken if there 
is a problem.Security experts have warned of the potential 
risk of large numbers of unsecured devices connecting to 
the Internet since the IoT concept was first proposed in the 
late 1990s. In December of 2013, a researcher at Proofpoint, 
an enterprise security firm, discovered the first IoT botnet. 
According to Proofpoint, more than 25 percent of the botnet 
was made up of devices other than computers, including smart 
TVs, baby monitors and other household appliances. 

Email security is a broad term that encompasses multiple 
techniques used to secure an email service. From an indi- 
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vidual/end user standpoint, proactive email security measures 
include: 


e Strong passwords 

e Password rotations 

e Spam filters 

e Desktop-based anti-virus/anti-spam applications 


Similarly, a service provider ensures email security by using 
strong password and access control mechanisms on an email 
server; encrypting and digitally signing email messages when 
in the inbox or in transit to or from a subscriber email 
address. It also implements firewall and software-based spam 
filtering applications to restrict unsolicited, untrustworthy and 
malicious email messages from delivery to a users inbox. 

ECommerce security refers to the principles which guide 
safe electronic transactions, allowing the buying and selling of 
goods and services through the Internet, but with protocols in 
place to provide safety for those involved. Successful business 
online depends on the customers trust that a company has 
eCommerce security basics in place. 


II. METHODOLOGY 


In this paper we studying the security issues in different 
areas like wireless network, E-mail security, wireless mesh 
networks, internet of things (IoT), E commerce, cloud storage 
systems etc.Then discussing the control measures for the 
enhance the security level. 


A. Wireless Network Security 


Wireless networking provides numerous opportunities to 
increase productivity. But it is inherently insecure. From jam- 
ming to eavesdropping, from man-in the middle to spoofing, 
there are a variety of attack methods that can be used against 
the users of wireless networks. Modern wireless data networks 
use a variety of cryptographic techniques such as encryption 
and authentication to provide barriers to such infiltrations. 
The wireless communication technology also acquires various 
types of security threats. This area discusses a wide variety 
of attacks in WSN and their classification mechanisms and 
different securities available to handle them including the 
challenges faced. 

This paper discussed the threats and vulnerabilities asso- 
ciated with each of the three basic technology components 
of wireless networks (clients, access points, and the trans- 
mission medium) and described various commonly available 
countermeasures that could be used to mitigate those risks[1]. 
A combined effort of users, employers and system admin- 
istrators is required in order to fight against such malicious 
activities. Appropriate countermeasures in every form can help 
the organization minimize the risk of illegal penetration. Up 
to date tools, constant monitoring, proper management and 
appropriate countermeasures are the ultimate weapons to fight 
against wireless security attacks. 

Wireless networks serve as the transport mechanism be- 
tween devices and among devices and the traditional wired 
networks (enterprise networks and the Internet). Wireless 
networks are many and diverse but are frequently categorized 


into three groups based on their coverage range: Wireless Wide 
Area Networks (WWAN), WLANs, and Wireless Personal 
Area Networks (WPAN). WWAN includes wide coverage 
area technologies such as 2G cellular, Cellular Digital Packet 
Data (CDPD) and Global System for Mobile Communications 
(GSM), and Mobitex. WLAN, representing wireless local area 
networks, includes 802.11, HiperLAN, and several others[10]. 

The 802.11 standard provides a number of options for 
authentication. Here we discuss the two that provide the most 
protection from unauthorized users[7]. The IEEE 802.11 
standard permits devices to establish either peer-to-peer (P2P) 
networks or networks based on fixed access points (AP) with 
which mobile nodes can communicate. Hence, the standard 
defines two basic network topologies: the infrastructure 
network and the ad hoc network. The infrastructure network 
is meant to extend the range of the wired LAN to wireless 
cells. This section gives architecture, features and taxonomy. 
Figure | and Figure 2 provides the Infrastructure Mode and 
IEEE 802.11 Ad Hoc Mode Architecture. 
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Fig. 1. Infrastructure Mode 


Fig. 2. IEEE 802.11 Ad Hoc Mode Architecture 


A layered approach to wireless security can provide a high 
degree of protection and leverage existing network security 
investments. The layered approach consists of the following 
four levels: 


e Wireless deployment and policy 
e Wireless access control 
e Perimeter security 
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e Application security 


Organizations can mitigate risks to WLANs by applying 
countermeasures to address specific threats and vulnerabilities. 
Countermeasures at the management, operational and technical 
levels can be effective in reducing the risks commonly associ- 
ated with WLANs. A combined effort of users, employers and 
system administrators is required in order to fight against such 
malicious activities. Appropriate countermeasures in every 
form can help the organization minimize the risk of illegal 
penetration. Up to date tools, constant monitoring, proper 
management and appropriate countermeasures are the ultimate 
weapons to fight against wireless security attacks[1]. 


B. Security Enhancements in Various E-mail Systems 


Nowadays, most of people and organizations use the e-mail 
for different needs to exchange the information between users. 
E-mail application is the important network applications. It is 
significant when business, health and educational communities 
use it for exchange of critical information such as business 
information, health patient record and so on. Recently avail- 
able e-mail standards provide protection of e-mail messages 
using standard cryptographic techniques and formats like PGP, 
S/MIME. This paper surveys various e-mail security solutions. 
Here introducing different models and techniques used to solve 
and enhance the security of e-mail systems and evaluate each 
one from the view point of security. Current e-mail system 
has many serious problems and the most important are the 
following: 


e The authentication mechanism that based on user name 
and password considered very weak because attacker can 
easily guess password using dictionary attack and break 
authentication mechanism. 

e Protection of mailboxes and e-mail messages on mail 
servers depend on Operating Systems (OS) security. If 
OS security is not properly configured and policies are 
not enforced, then attacker can easily gain access to these 
Mailboxes and e-mail messages. 

e Most of e-mail users send e-mails in clear, because 
they dont have sufficient knowledge to configure security 
parameters. So, attacker can easily read and modify e- 
mail letters. 

e Most of current e-mail systems dont handle attachment 
files in conventional and inefficient way. 

e Most of e-mail clients and servers do not support effective 
mechanism for confirmation of delivery of e-mail letters. 


E-mail security is the process of protection from unau- 
thorized access to information, its use, revelation, destruc- 
tion, change and harm. Information privacy, integrity and 
accessibility are the main elements on which e-mail security. 
Figure 1 provides a rough overview of Security techniques 
and enhancement. This paper includes a background about 
the existing techniques used to handle the security of e-mail 
systems and introduces the various enhancements that applied 
to the security of e-mail systems. These enhancements include: 


1) E-mail Message Confidentiality. 


2) E-mail Message Integrity. 
3) Message sender Authentication. 
4) Message sender Non-Repudiation. 
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Fig. 3. 


The most talented techniques used are the Encryption 
and digital signature. Encryption used to verify the data 
confidentiality whereas the digital signature verifies the rest 
of enhancements (e.g. Integrity, Authentication and Non - 
Repudiation). There are many protocols proposed to offers 
the Encryption together with the digital signature and the 
most famous existing ones are the PGP (Pretty Good Privacy) 
and S/MIMI (Secure/Multipurpose Internet Mail Extension). 
Some of these solutions include a particular enhancement 
such as authenticated e-mail systems and the confidentially 
and Privacy E-mail Systems[2]. Some enhancements, such 
as integrity and non-repudiation, needs to integrated with 
other enhancements to provide the solution with high level 
of security. 

The main vulnerability in Simple Mail Transfer Protocol 
(SMTP) is that users are not authenticated, which allows the 
spoofing attacks. A Model has been proposed to enhance e- 
mail authentication and preventing e-mail address spoofing and 
to overcome SMTP authentication vulnerability. The proposed 
solution works by authenticating the domain of the sender. It 
obtains the sender hosts information by checking the infor- 
mation of the Received field for an e-mail in the user side to 
distinguish if it is a spooled e-mail or not. 

Secure e-mail system to provide privacy of locations from 
which e-mail system is accessed, protection of e-mails, and 
legitimate users authentication. This system considers several 
principles in terms of user friendliness, usability, scaling, 
privacy of users, and security of e-mails. The process of Secure 
e-mail proxy interactions can be divided into four steps as 
shown below: 


e Step 1: The user logs into the secure e-mail proxy using 
the e-mail address and password of his/her native e-mail 
system. 
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e Step 2: For new users, the proxy generates a key pair, 
sends public key to the CA server, receives and stores 
the users certificate. 

e Step 3: The proxy fetches the e-mails from the corre- 
sponding e-mail server. Upon double click, each letter is 
cryptographically processed and displayed. 

e Step 4: The user can now send signed and encrypted e- 
mails. 

Other solution proposed to provide secure, high assurance 
and very reliable e-mail system that is Crypto NET. This 
system handles standard e-mail security services like signed 
and encrypted e-mail. Integrate more than one security en- 
hancements in one e-mail system to improve its security sig- 
nificantly and protect the e-mail systems from different flaws. 
The proposed e-mail security system is a complete end to 
end system used the improved encryption/decryption algorithm 
integrated with the users thumbprint biometric features[2]. 


C. Security & Privacy Issues in Cloud Storage Systems 


The importance of Cloud Storage Systems are increasing 
and it is receiving a growing attention in the scientific and 
industrial communities. They are used to easily retrieve the 
files we uploaded to the online file storage applications. The 
security properties in a cloud storage system are confidential- 
ity, integrity, write serializability and read freshness. These 
properties ensure that users data is always secure and cannot 
be modified by unauthorized users and the data is always at 
the latest versions when being retrieved by the user[3]. 

1) Characteristics of Cloud Service: One of the core 
themes of cloud computing and cloud storage is that service 
should be independent of the location. Some of the character- 
istics of cloud services are as follows. 

i) Location Independent Services: The very character- 
istics of the cloud computing services is the ability 
to provide services to their clients irrespective of the 
location of the provider, the physical hardware below 
could be moved anywhere but the services should still 
be available. 

ii) Communications: The communication lines could ex- 
ist from few seconds to hours based on the services be- 
ing consumed. So the security of this communication 
lines should be persistent as long as the connection 
between the provider and consumer exists at minimum 
and cover some buffer period too. 

iii) Infrastructure: The infrastructure that is used for these 
services should be secured appropriately to avoid any 
potential security threats and should cover the life time 
of component. This lifetime can be estimated to be 
about 10 years. 

iv) Storage Security: The data that is stored on the cloud 
services often would last longer than the security 
that should be ensured of the components which are 
used to store or compute these data. This would 
entail the storage services should be robust enough 
to achieve component and hardware changes easily 
and transparently. This applies to the algorithms and 


encryptions schemes that are used to secure this data; 
they could become obsolete and might become easy 
targets to brute force attacks as the processing powers 
of the various devices keep increasing. 

v) Backup Storage: In this aspect the security should 
outlast general storage security and the life span could 
be assumed to be greater than thirty years, and as 
with normal storage services the technologies should 
be resistant to component and hardware changes as 
well as the algorithms used to store the data. 


Figure 2 provides some major Cloud Storage Systems 
commonly used today. 
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Fig. 4. Cloud storage 


Security issues could be classified into two parts : Access 
Security and Service Security. Access Security is the Com- 
munications to the cloud service provider is a potential point 
at which threats to the service could be exposed and Service 
Security are possible at the point of service provision and this 
could include the actual device security at the cloud provider 
and the storage security used by the provider. 

2) Stake Holders in Cloud Service: There are various stake 
holders in a cloud service environment and they are: 


i) Individual Users : This is a huge number of individual 
consumers 
ii) Aggregate Users : These are users in a group such as 
organizations or corporates. 
iii) Cloud Service Providers: These are service providers 
which would provide on demand services such as 
computing, storage or other related services. 


3) Cloud Storage: The urgent and significant issues of 
cloud storage that face during the use of the service. They 
are trusting data stored in the cloud,lack of provable security 
in cloud service provider agreements,data history,provable data 
possession,and the last one is the use of cloud storage services 
as an online slack space. 

i) Untrusted Cloud Storage: Cloud storage is scalable 
and always available data storage with feasible cost 
reductions though with all the said features the client 
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needs and during states of buggy software, hardware iv) Error/Malicious Node Localization: The possible se- 
failures and malicious attacks the service might en- curity threats that are possible in this scenarios could 
counter inconsistent service and valid solution to this be a possibly malicious and self-interested cloud ser- 
problem is called Depot. vice provider due to monetary reasons might remove 
ii) Reliable Cloud Storage and Data History: Cloud the data less frequently used to secondary storage 
storage is being used as a means to store backups devices or would try to hide a data corruption or loss 
of the local systems and other user data or appli- incidents due to internal issues. 
cation data, but a very important property that we v) Preventing online cloud storage as Slack Space: The 


have come to be associated with the desktop storage 
system which provides a suitable provision for data 
security is missing in the cloud storage systems, data 
provenance. Provenance of data objects in a cloud 
storage environment is extremely important because 
data on the cloud in most scenarios would be shared 
and this makes it important that data consumers should 
have the ability to know how the data was updated and 
how trust worthy the data is. 


consumer based applications like Dropbox have a huge 
customer base with millions of users and billions of 
file being stored. Though the system design allows 
very weak security and can be easily be manipulated to 
breach the privacy of unassuming customers and files 
uploaded could be easily be retrieved without much 
effort until recently. These services have the potential 
to be used as hidden channel to leak data stored on 
the system[3]. 


A simple way to store provenance of a data object 
like a file would be to create a directed acyclic graph 
(DAG), each node in the graph representing a file. The 
graph is by definition non cyclic because existence of 
the cycle would indicate that an object is the ancestor 
to itself. A good and effective solution to this situation 
involves usage of PASS system to collect provenance 
data during data storage operation to the cloud and 
updating these data in the cloud separately. PASS 
would record various attributes like the file name, the 
process name that created it, file id etc. In order for 
the provenance of data to be suitable for cloud storage 
some of these properties need to be adhered. 


D. Security and Privacy Issues in Wireless Mesh Networks 


A Wireless Mesh Network (WMN) is a communication 
network made up of radio nodes organised in a mesh topology. 
WMNs are charecterized by dynamic self-organization, self- 
configuration, easy maintenance and low cost. WMN consist 
of mesh clients, mesh routers and gateways. The mesh clients 
are often laptops, cell phones and other wireless devices while 
the mesh routers forward traffic to and from the gateways 
which may, but need not, be connected to the Internet. The 
coverage area of the radio nodes working as a single network 
is sometimes called a mesh cloud. A mesh network is reliable 
and offers redundancy. When one node can no longer operate, 
the rest of the nodes can still communicate with each other, 
directly or through one or more intermediate nodes. Figure 3 
depicts a mesh topology architecture. 


e Data-Independent Persistence: This property is 
used to ensure that the cloud store the data objects 
provenance even if the data is removed from the 
system. 

e Provenance Data Coupling: The provenance of 
the data object should be able to completely and 
accurately describe the data object and should be 
tightly coupled. 

e Multi-Object Causal ordering: This ensures the 
causal relationship between objects. 


Computer 
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iii) Provable Security in Cloud Storage Systems: Along 
with providing security of the data stored there should 
also be a technique to show as well as prove that 
data is being stored with maximum available security 
measures. Most of these systems are not effective in 
this scenario because they were not designed for a 
cloud storage system and were more suitable for a 
personal storage system locally deployed. Along with 
these problem these system were extremely capable 
at detecting data corruption or server misbehavior 
but were not able to provide suitable proofs of this 
corruptions or misbehaviors. Another problem with 
those solutions was their apparent limitations toward 
scalability and with cloud service provider it is a very 
important aspect of the service they provided. 
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Fig. 5. Mesh topology 


1) Security challenges and issues in WMNs: The reasons 
that WMNs are more difficult to be fully protected are: 


e Multi-hop Nature 
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e Multitier System security 

e Multisystem security 

2) Constraints in WMNs: The constraints that should be 
considered in WMNs are Central Processing Unit, Battery, 
Mobility, Bandwidth and Scalability. 

3) Security Issues in WMNs: The security issues for WMNs 
are Availability, Authenticity of network traffic, Integrity, 
Confidentiality, Authorization, Access Control, Fairness and 
Accountability. 

4) Security attacks in WMNs: Security attacks may be 
classified on the basis of several factors, such as the nature, 
the scope, the behavior, or the protocol layer the attacker 
target. Depending on whether the operation of the network 
is disrupted or not, the attacks may be distinguished on 
active and passive attacks. An active attack is conducted to 
intentionally disrupt the network operation, whereas a passive 
attack intends to steal information and to eavesdrop on the 
communication within the network. Active attacks can be 
further divided into internal and external. External attacks are 
conducted by attackers that do not participate in the mesh 
topology usually by jamming the communication or injecting 
erroneous information[5]. Internal attacks are conducted by 
members of the mesh network. An attack also can be rational 
or malicious. In a rational attack, the adversary misbehaves 
only if misbehaving may worth something in terms of price, 
obtained quality of service, or resource saving; otherwise, it 
is characterized as malicious. Attacks might apply in different 
protocol layers of a WMN. 

i) Security attacks at the physical layer of WMNs: There 
are several types of attacks that can affect the physical 
layer of a WMN. First, because the wireless mesh 
routers may be installed in external area, an attacker 
may simply destroy the hardware of such a node. 
Also, the wireless mesh routers may be tampered, 
and sensitive information may be extracted from them. 
The physical layer can be also affected by using radio 
jamming devices, which may meddle in the physical 
channels and disturb the network’s availability. 

ii) Security attacks at the MAC layer of WMNs: The 
different attacks possible at the MAC layer of the 
WMNs are Passive eavesdropping and Jamming At- 
tack. The jamming attacks for the mesh network are 
Unprompted Clear to Send (CTS) Attack, Reactive 
Request to Send (RTS) Jamming Attack and CTS 
Corrupt Jamming. 

iii) Security attacks at the network layer of WMNs: An 
attacker could also target the network layer of WMNs. 
These attacks can be divided into two categories: 
control plane (or routing) attacks and data plane (or 
path forwarding) attacks. Control plane attacks target 
the routing functionality of the network, whereas data 
plane attacks target the path forwarding functionality 
of the network. 

iv) Security attacks at the transport layer of WMNs: 
Possible attacks in this layer are flooding and desyn- 
chronization, that is, the disruption of an existing con- 


nection. In the flooding attack, a malicious node may 
repeat-edly make new connection requests until the 
resources required by each connection are exhausted 
or reach a maxi-mum limit. In the desychronization 
attack, a malicious node may repeatedly spoof mes- 
sages to an end host causing the host to request the 
retransmission of missed frames. If timed correctly, an 
attacker may degrade or even prevent the ability of the 
end hosts to successfully exchange data causing them 
instead to waste energy attempting to recover from 
errors, which never really exist. 

v) Security attacks at the application layer of WMNs: Ap- 
plication Layer attacks in wireless networks concern 
viruses, worms, malicious codes, application abuses, 
and so on. 

5) Countermeasures for WMNs: 

i) Intrusion prevention mechanisms: Intrusion prevention 
mechanisms are considered as the principle line of de- 
fense against malicious nodes and include encryption 
and authentication, as well as secure routing. 

ii) Secure routing: Because of open medium, the routing 
protocols are con-stantly victims of attacks trying to 
compromise their capabilities. Therefore, the routing 
protocol used inside a mesh should be secured against 
attacks. To obtain this goal, researchers proposed ei- 
ther mechanisms to enhance existing routing protocols 
used for ad hoc networks or new security protocols 
that are suitable for WMNs. 

iii) Intrusion detection systems: Intrusion detection sys- 
tems are also deployed to provide a second line 
of defense. ntrusion Detection Systems in wired or 
wireless networks are used to alert the users about 
possible attacks, ideally in time to stop the attack or 
mitigate the damage. They consist of three functions: 
Event monitoring, Analysis engine and Response[5]. 


E. Security and Privacy Issues in Internet of Things 


Internet of Things (IoT) embodies the concept of free 
flow of information amongst the various embedded computing 
devices using the internet as the mode of intercommunication. 
The term Internet of Things was first proposed by Kevin 
Ashton in the year 1982. With the aim of providing advanced 
mode of communication between the various systems and 
devices as well as facilitating the interaction of humans with 
the virtual environment, IoT finds its application in almost any 
field. But as with all things using the internet infrastructure 
for information exchange, IoT to is susceptible to various 
security issues and has some major privacy concerns for the 
end users[4]. 

1) Connectivity Technologies and Interaction Amongst Var- 
ious Internet of Things (IoT) Devices: The automatic exchange 
of information between two systems or two devices without 
any manual input is the main objective of the Internet of 
Things. This automated information exchange between two 
devices takes place through some specific communication 
technologies, which are described below. 
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i) 


ii) 


iii) 


Wireless Sensor Networks (WSN): WSN are compo- 
sitions of independent nodes whose wireless com- 
munication takes place over limited frequency and 
bandwidth. The communicating nodes of a typical 
wireless sensor network consist of the following parts: 


e Sensor 

e Microcontroller 

e Memory 

e Radio Transceiver 
e Battery 


Radio Frequency Identification (RFID): RFID technol- 
ogy is mainly used in information tags interacting with 
each other automatically. It uses the wireless tech- 
nology of Automatic Identification and Data Capture 
(AIDC). 

e RFID tags (Transponders): In a RFID tag, an 
antenna is embedded in a microchip. The RFID 
tag also consists of memory units, which houses 
a unique identifier known as Electronic Product 
Code (EPC). As per the classification in the types 
of RFID tags are: 


a) Active tag: This type of tag houses a battery 
internally, which facilitates the interaction of 
its unique EPC with its surrounding EPCs 
remotely from a limited distance. 

b) Passive tag: In this type of tag, the information 
relay of its EPC occurs only by its activation 
by a transceiver from a pre-defined range of 
the tag. 

RFID readers (Transceivers): The RFID reader func- 
tions as the identification detector of each tag by its 
interaction with the EPC of the tag under its scan. 


2) Security issues and privacy concerns: As such some 
of the prominent security issues stemming out from the 
communication technology are the following: 


i) 


ii) 


iii) 


Security issues in the wireless sensor networks 
(WSNs): The oppressive operations that can be per- 
formed in a wireless sensor network can be catego- 
rized under three categories: 

e Attacks on secrecy and authentication 

e Silent attacks on service integrity 

e Attacks on network availability 

DoS attack on the physical layer: This layer of the 
wireless sensor network is attacked mainly through: 

e Jamming: In this type of DoS attack occupies the 
communication channel between the nodes thus 
preventing them from communicating with each 
other. 

e Node tampering: Physical tampering of the node 
to extract sensitive information is known as node 
tampering. 

DoS attack on the link layer: The DoS attacks taking 
place in this layer are: 


e Collision: This type of DoS attack can be initiated 


when two nodes simultaneously transmit packets 
of data on the same frequency channel. 

e Unfairness: Unfairness is a repeated collision 
based attack. It can also be referred to as exhaus- 
tion based attacks. 

e Battery Exhaustion: This type of DoS attack 
causes unusually high traffic in a channel making 
its accessibility very limited to the nodes. 


iv) DoS attack on the network layer: The main function 


of the network layer of WSN is routing. The specific 
DoS attacks taking place in this layer are: 


e Spoofing, replaying and misdirection of traffic. 

e Hello flood attack: This attack causes high traffic 
in channels by congesting the channel with an 
unusually high number of useless messages. 

e Homing: In case of homing attack, a search is 
made in the traffic for cluster heads and key 
managers which have the capability to shut down 
the entire network. 

e Selective forwarding: As the name suggests, in 
selective forwarding, a compromised node only 
sends a selected few nodes instead of all the 
nodes. 

e Sybil: In a Sybil attack, the attacker replicates a 
single node and presents it with multiple identities 
to the other nodes. 

e Wormhole: This DoS attack causes relocation of 
bits of data from its original position in the 
network. 

e Acknowledgement flooding: Acknowledgements 
are required at times in sensor networks when 
routing algorithms are used. 


v) DoS attack on the transport layer: The DoS attacks in 


this layer are: 


e Flooding: It refers to deliberate congestion of 
communication channels through relay of unnec- 
essary messages and high traffic. 

e De-synchronization: In de-synchronization attack, 
fake messages are created at one or both endpoints 
requesting retransmissions for correction of non- 
existent error. 


vi) DoS attack on the application layer: In this layer, 


a path-based DoS attack is initiated by stimulating 
the sensor nodes to create a huge traffic in the route 
towards the base station. 


vii) Security issues in RFID technology 


e Unauthorized tag disabling (Attack on authentic- 
ity): The DoS attacks in the RFID technology 
leads to incapacitation of the RFID tags temporar- 
ily or permanently. 

e Unauthorized tag cloning (Attack on integrity): 
The capturing of the identification information 
(like its EPC) esp. through the manipulation of the 
tags by rogue readers falls under this category. 

e Unauthorized tag tracking (Attack on confidential- 
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ity): A tag can be traced through rogue readers, 
which may result in giving up of sensitive infor- 
mation like a persons address. 

e Replay attacks (Attack on availability): In this 
type of impersonation attacks the attacker uses 
a tags response to a rogue readers challenge to 
impersonate the tag[4]. 


F. Study of E-commerce Security Issues and Solutions 


E-commerce Security is a part of the Information Security 
framework and is specifically applied to the components 
that affect e-commerce that include Computer Security, Data 
security and other wider realms of the Information Security 
framework. E-commerce security has its own particular nu- 
ances and is one of the highest visible security components 
that affect the end user through their daily payment interaction 
with business. 

E-commerce security is the protection of e-commerce as- 
sets from unauthorized access, use, alteration, or destruc- 
tion. Dimensions of e-commerce security-Integrity, Non- 
repudiation, Authenticity, Confidentiality, Privacy, Availability. 
E-Commerce offers the banking industry great opportunity, 
but also creates a set of new risks and vulnerability such as 
security threats. 

Security is very important in online shopping sites. Now 
days, a huge amount is being purchased on the internet, 
because its easier and more convenient. Almost anything can 
be bought such as music, toys clothing, cars, food and even 
porn. Some of the popular websites are eBay, iTunes, Amazon, 
HMV, Mercantila, dell, Best Buy and much more[6]. 

i) E-Commerce Security Tools: The E-Commerce secu- 
rity tools are: 
e Firewalls Software and Hardware 
e Public Key infrastructure 
e Encryption software 
e Digital certificates 
e Digital Signatures 
e Biometrics retinal scan, fingerprints, voice etc 
e Passwords 
e Locks and bars network operations centers 


ii) Purpose Of Security: The purpose security are: 

e Data Confidentiality is provided by encryption / 
decryption. 

e Authentication and Identification ensuring that 
someone is who he or she claims to be is im- 
plemented with digital signatures. 

e Access Control governs what resources a user 
may access on the system. Uses valid IDs and 
passwords. 

e Data Integrity ensures info has not been 
tampered with. Is implemented by message 
digest or hashing. Non-repudiation not to deny 
a sale or purchase Implemented with digital 
signatures. 


iii) Security Threats: Three types of security threats are: 


e Denial of service 
e Unauthorized access 
e Theft and fraud 


iv) Denial of Service (DOS): 


e Two primary types of DOS attacks : spamming 
and viruses. 
e Spamming Sending unsolicited commercial 


emails to individuals. 

e Viruses: self-replicating computer programs de- 
signed to perform unwanted events. 

e Worms: special viruses that spread using direct 
Internet connections. 

e Trojan Horses: disguised as legitimate software 
and trick users into running the program. 


v) Unauthorized access: 


e Illegal access to systems, applications or data 

e Passive unauthorized access listening to commu- 
nications channel for finding secrets. May use 
content for damaging purposes 

e Active unauthorized access Modifying system or 
data Message stream modification 

e Changes intent of messages, e.g., to abort or delay 
a negotiation on a contract 

e Masquerading or spoofing sending a message that 
appears to be from someone else. 


vi) Theft and fraud: 


e Fraud occurs when the stolen data is used or 
modified. 


vii) Secure Online Shopping Guidelines 


e Shop at Secure Web Sites- Secure sites use en- 
cryption technology to transfer information from 
your computer to the online merchant’s computer. 

e Be Aware of Cookies and Behavioural Marketing- 
Cookies an online tracking system that attaches 
pieces of code to our Internet browsers to track 
which sites we visit as we search the Web. 

e Never Give Out Your Social Security Number- 
Providing your Social Security number is not a 
requirement for placing an order at an online 
shopping site. 

e Always Print or Save Copies of Your Orders. 

e Shop with Companies Located in the United 
States- When you shop within the U.S., you are 
protected by state and federal consumer laws. 

e Learn the Merchant’s Cancellation, Return and 
Complaint-Handling Policies. 

e Be Cautious with Electronic Signatures. 

e Know How Online Auctions Operate- safely using 
an online auction site is to read the terms of use, 
which will outline key issues such as whether or 
not the seller or the site is responsible for any 
problems that arise. 


viii) List of Ways to Protection 
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e Change your passwords from time to time. 

e Don’t choose a password that you use anywhere 
else. 

e Get regular audits these services usually come 
with an icon that you can put on your store, and 
they have been known to boost sales. 

e Apply updates to your shopping cart when avail- 
able. 

e Apply security patches to your shopping cart 
when available. 

e Sign up with a managed firewall service these 
services usually come with an icon that you can 
put in your store, and they have been known to 
boost sales. They are not free though. 

e Choose a shopping cart that records IP in the 
admin and store section. 

e Choose a shopping cart that can blacklist (block) 
IP addresses and users. 


III. CONCLUSION 


The security of wireless is actually very good using the 
latest encryption technology. Thats not to say it cant be hacked 
into it can and being more open, in terms of accessibility, 
the danger is certainly greater. This can be overcome by im- 
plementing strong passwords and both hardware and software 
security solutions.There are of course businesses that do have 
particularly desirable data that hackers would like to get their 
hands on like banks for instance, with all of the personal 
financial information they hold. For these organisations, that 
sort of sensitive data warrants maximum securityAccording to 
this survey, we found that the future researches in the e-mail 
security field are directed to set up a highly efficient security 
system in which the following are available: 


1) Assuring the truth identity of the e-mails user to prevent 
the unauthorized access and different attacks. 

2) Encapsulating the messages by applying multiple levels 
of encryption to ensure its confidentiality and prevent any 
modification on it. 

3) Protecting e-mails transfer from sender to receiver via 
secure channel. 

4) Applying a mechanism on the sent e-mails that destroy 
the message in case the work of several attempts to 
decrypt it. 

5) Applying client-side security system to provide more of 
restrictions and to make effective and securely transac- 
tions even if the security of server harmed. 


We have surveyed all the security flaws existing in the in- 
ternet of things that may prove to be very detrimental in 
the development and implementation of IoT in the different 
fields. so adoption of sound security measures countering the 
detailed security flaw as well as implementation of various 
intrusion detection systems, cryptographic and stenographic 
security measures in the information exchange process and 
using of efficient methods for communication will result in a 
more secure and robust IoT infrastructure. 

E-commerce is widely considered the buying and selling 
of products over the internet, but any transaction that is 
completed solely through electronic measures can be consid- 
ered e-commerce. Day by day E-commerce and M-commerce 
playing very good role in online retail marketing and peoples 
using this technology day by day increasing all over the 
world. E-commerce security is the protection of e-commerce 
assets from unauthorized access, use, alteration, or destruction. 
Fraudsters are constantly looking to take advantage of online 
shoppers prone to making novice errors. Common mistakes 
that leave people vulnerable include shopping on websites that 
aren’t secure, giving out too much personal information, and 
leaving computers open to viruses. Use of secured websites 
with proper security and privacy policies makes the online 
transaction more easily. 
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Abstract—This paper reviews big data challenges from a data 
management respective. The growth of the data volume in our 
digital world seems to out speed the advance of our computing 
infrastructure. Conventional data processing technologies, such 
as database and data warehouse, are becoming inadequate to 
the amount of data we want to deal with. Big data should 
include data sets with sizes beyond the ability of commonly-used 
software tools to capture, manage, and process the data within a 
tolerable elapsed time. Based on this concept, re-searchers have 
summarized three important aspects of big data that go beyond 
the ability of our current data process-sing technology. They are 
Volume, Velocity and Variety, also known as 3Vs. There exist 
study to develop an analytical framework with the ability of 
in-memory processing to extract and analyze structured and un- 
structured Twitter data.The proposed framework includes data 
ingestion, stream processing, and data visualization components. 


Index Terms—Bigdata, Social media, Bigdata Analysis, Face- 
book data, Twitter Streaming, Challenges 


I. INTRODUCTION 


Since information technology is innovating on the way we 
live, our collection of digital data has started to grow rapidly. 
Today, there is tremendous amount of data generated every- 
day in the sectors of manufacturing, business, science and our 
personal lives. Proper processing of the data could reveal new 
knowledge about our market, society and environment, and 
enable us to react to emerging opportunities and changes in a 
timely manner. However, the growth of the data volume in our 
digital world seems to out speed the advance of our comput- 
ing infrastructure. Conventional data processing technologies, 
such as database and data warehouse, are becoming inadequate 
to the amount of data we want to deal with. 

This new challenge is known as big data. Due to its 
importance and commonness, it has gained enormous attention 
in recent years. There has not been a commonly accepted 
definition about big data, though people usually believe that 
big data should include data sets with sizes beyond the ability 
of commonly-used software tools to capture, manage, and 
process the data within a tolerable elapsed time. Based on 
this concept, re-searchers have summarized three important 
aspects of big data that go beyond the ability of our current 
data processing technology. They are Volume, Velocity and 
Variety, also known as 3Vs. 


Siji K B 
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The massive amounts of data being accumulated from 
various sources, analysis of Big Data is vastly important for 
decision making of truly any kindwhether it is for businesses, 
scientific study, or the improvement of technology With as 
a few examples. Moreover, real-time applications rely upon 
instantaneous input and fast analysis to arrive at a decision 
or action within a short and very specific timeline. Originally, 
data analytics have been performed after storing data on hard 
disks, which eventually have a fair amount of access latency. 
Dealing with large amount of structured and unstructured data 
in real-time makes hard disks undesirable, as a result, there 
has been a recent transition from hard disk drive storage to 
memory storage. In-memory processing significantly decreases 
the amount of access latency, which will have a crucial role 
when real-time analytics is performed. 

Analyzing data in real-time requires data ingestion and 
processing of the stream of data before the data storage step. 
Some of the applications of the real-time data analytics are 
surveillance, environment, health care, business intelligence, 
marketing, visualization, cyber security, and social media. 
This study presents a real-time data analytics framework for 
analyzing Twitter data. Current real-time methodologies use 
tools and technologies to process Twitter data which are 
using event processing and one-message-at-a-time analysis. 
This makes it possible to achieve real-time result, but lacks 
the ability of doing anything more than plain processing. 
The proposed framework offers an infrastructure for real - 
time processing with the ability of extending the analytical 
capability. 

The background and state-of-the-art of big data. We first in- 
troduce the general background of big data and review related 
technologies, such as could computing, Internet of Things, data 
centres, and Hadoop. We then focus on the four phases of the 
value chain of bigdata, i.e., data generation, data acquisition, 
data storage, and data analysis. For each phase, we introduce 
the general background, discuss the technical challenges, and 
review the latest advances. We finally examine the several 
representative applications of big data, including enterprise 
management, Internet of Things, online social networks, me- 
dial applications, collective intelligence, and smart grid. These 
discussions aim to provide a comprehensive overview and 
big-picture to readers of this exciting area. This survey is 
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concluded with a discussion of open problems and future 
directions. 

Big Data contains large amount of unstructured data in the 
form of movie data, Facebook data, and industry data and so 
on. There are number of posts are posted on twitter about 
movies by different users. Out of these posts some of posts 
may be inappropriate. These posts contain negative comments 
as well as positive comments about movies. It is difficult to 
distinguish large number of positive and negative posts. To 
overcome this kind of problem we proposed a rating based 
mechanism that distinguishes abnormal posts with the help of 
users rating. If rating is positive then post is normal otherwise 
it is abnormal. To implement proposed mechanism we used 
hadoop platform and MapReduce paradigm. 

We categorize our paper into three parts : The main topic 
big data challenges,big data varieties and Analysis of big data 
on Facebook and twitter. 


A. Bigdata Challenges 


Despite of the various big-volume issues, there is still no 
agreement on the quantification of big data. Such quantifica- 
tion depends various factors. First, the complexity of the data 
structure is an important factor. A relational dataset of several 
petabytes may not be called big data, since it can be readily 
handled by todays DBMSs. In contrast, a graph dataset of 
several terabytes is commonly regarded as big data, as graph 
processing is very challenging to our technologies. Second, 
the requirements of target applications should be considered 
as a factor too. In scientific research, a waiting time of several 
hours is usually acceptable for a biologist. Most automated 
trading systems, in contrast, require sub-second response time 
regardless of how big the data is. 

The challenge of velocity comes with the need to handle 
the speed with which new data is created or existing data is 
updated. This issue particularly applies to machine generated 
data, such as that generated by sensing or mobile devices, 
which are being deployed everywhere. In those applications, 
large amount of new and updated data flies into the systems 
relentlessly, while we require the systems to make sense of 
the data immediately upon its creation. Data velocity brings 
challenges to every stack of a data management platform. 
Both the storage layer and the query processing layer need 
to be extremely fast and scalable. The technology of data 
streaming has been investigated for several years to handle 
high velocity. However, the capacity of the existing streaming 
systems is still limited, especially when dealing with the in- 
creasing volume of incoming data in todays sensor networks, 
telecommunication system, etc. 

In real-world applications, data often does not come from a 
single source. Big data implementations require handling data 
from various sources, in which data can be of dierent formats 
and models. This bring forth the challenge of data variety. The 
variety of data provides more information to solve problems 
or to provide better service. The question is how to capture 
the dierent types of data in a way that makes it possible to 
correlate their meanings. Typically, data can be classified into 


three general typesstructured data, semi-structured data and 
unstructured data. There have been sophisticated technologies 
to deal with each of these data types, such as those of database 
and information retrieval. How-ever, a seamless integration of 
these technologies remains as a challenge. 


B. Big Data Variety 


i) User generated contents (UGCs) from applications 
with massive users. Examples are tweets, blogs, dis- 
cussions, photos/videos posted and shared by users of 
many Web 2.0 applications. The data of these applica- 
tions are directly contributed by users, and therefore, 
they are typically unstructured for user convenience. 

ii) Transactional data that are generated by a large 
scale system due to massive operations/transactions 
processed by the system. Examples of big transac- 
tional data are Web logs, business transactions, feeds 
of moving objects, reports of sensor networks, reads 
of radio-frequency identifications. These data are typ- 
ically structured with predefined schemas. They are 
often accumulated in a streaming manner. 

iii) Scientific data that are collected from data-intensive 
experiments or applications. Examples are celestial 
data, high-energy physics data, genome data, health- 
care data. Types of scientific data are very application- 
dependent, ranging from structured data (e.g., time 
series data) to semi-structured data (e.g., XML data) 
and unstructured data (e.g., images). In addition to 
the original data, provenance data (recording how data 
are generated and transformed) are very important for 
scientific data management. 

iv) Web data that are crawled and processed to support 
applications such as Web search and mining. As the 
World Wide Web contains billions of pages, it is quite 
easy to generate a huge Web corpus of numerous 
unstructured Web pages. Behind the Web pages, there 
are also a huge amount of deep web data which 
are even more important than the surface contents. 
They can also be crawled and integrated as important 
sources of Web data. 

v) Graph data that are formed by a very huge number 
of information nodes, and the links between the nodes. 
Examples are social networks and RDF knowledge 
bases. 


C. Big Data Analysis 


Twitter is an online social networking service with more 
than 300 million users, generating a huge amount of infor- 
mation every day. Twitters most important characteristic is its 
ability for users to tweet about events, situations, feelings, 
opinions, or even something totally new, in real time. Currently 
there are different workflows offering real-time data analysis 
for Twitter, presenting general processing over streaming data. 

This study will attempt to develop an analytical framework 
with the ability of in-memory processing to extract and analyze 
structured and unstructured Twitter data. 
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The proposed framework includes data ingestion, stream 
processing, and data visualization components with the 
Apache Kafka messaging system that is used to perform 
data ingestion task. Furthermore, Spark makes it possible to 
perform sophisticated data processing and machine learning 
algorithms in real time. 

The massive amounts of data being accumulated from 
various sources, analysis of Big Data is vastly important for 
decision making of truly any kindwhether it is for businesses, 
scientific study, or the improvement of technology With as 
a few examples. Moreover, real-time applications rely upon 
instantaneous input and fast analysis to arrive at a decision 
or action within a short and very specific timeline. Originally, 
data analytics have been performed after storing data on hard 
disks, which eventually have a fair amount of access latency. 
Dealing with large amount of structured and unstructured data 
in real-time makes hard disks undesirable, as a result, there 
has been a recent transition from hard disk drive storage to 
memory storage. In-memory processing significantly decreases 
the amount of access latency, which will have a crucial role 
when real-time analytics is performed. 

Analyzing data in real-time requires data ingestion and 
processing of the stream of data before the data storage step. 
Some of the applications of the real-time data analytics are 
surveillance, environment, health care, business intelligence, 
marketing, visualization, cybersecurity, and social media. This 
study presents a real-time data analytics framework for ana- 
lyzing Twitter data. Current real-time methodologies use tools 
and technologies to process Twitter data which are using event 
processing and one-message-at-a-time analysis. This makes 
it possible to achieve real-time result, but lacks the ability 
of doing anything more than plain processing.The proposed 
framework offers an infrastructure for real -time processing 
with the ability of extending the analytical capability. 


e Real-time Data Analytic framework: This framework has 
some characteristics which distinguish it from traditional 
data analytics approaches. The main idea here is that there 
is a need for methods to analyze thousands of tweets 
coming each second, in a short amount of time. Also, 
the framework should be independent of imported data 
volume; this is important because the volume of tweets 
is growing at a noticeable rate. The concept here is to 
collect event streams by different nodes and let multiple 
processing nodes to analyze data in parallel. So, the 
challenge here is how to manage streaming data and how 
to analyze it over the clusters. 


Data processing workflow usually connects computing re- 
sources to automate a sequence of tasks by processing large 
volumes of data, where different resources are connected 
for automating different tasks. In the case of streaming data 
processing, a scalable and distributed platform is required for 
combining large volumes of historic and streaming data at the 
same time. 

The framework consists of three sections: 


e Data ingestion 


e Data processing 
e Data visualization 


The data ingestion section, connects directly to Twitter stream- 
ing API and in a scalable manner import data to the frame- 
work. The data processing section with the ability of streaming 
processing over cluster accessses distributed imported data, 
analyzes data in-memory, and performs processing tasks on 
data, and finally sends the results to be monitored. 


i) Data Ingestion: 
Apache Kafka is a distributed streaming platform that 
uses publish-subscribe messaging and is developed to 
be a distributed, partitioned, replicated service. Our 
framework uses this message brokering system. To 
balance the incoming load, Topics are defined and each 
of these Topics is split into multiple partitions, each 
storing one or more of those partitions with ability 
to accept multiple formats,varying from text, image, 
video and other formats. This is an essential require- 
ment for Big Data systems to deal with unstructured 
data. 
ii) Streaming Data Processing: 
Spark has a core which is the distributed execu- 
tion engine. Additional libraries, built on the core, 
allow various workloads for streaming, SQL, and 
machine learning. These libraries allow Spark to per- 
form more sophisticated processings. For example, 
machine learning algorithms are often iterative, and 
Sparks ability to cache the dataset in memory helps 
enormously to speed up iterative tasks. It consists of 
Spark core and a set of libraries. Spark Streaming is an 
extension of the core Spark API that enables scalable, 
high-throughput, fault-tolerant stream processing of 
live data streams. Data may be ingested from many 
sources like Kafka, Flume, or TCP sockets, and may 
be processed using complex algorithms expressed with 
high-level functions such as map, reduce, join, and 
window. Finally, processed data may be pushed out to 
filesystems, databases, and live dashboards. 
iii) Data Visualization and Storage: 

Here the results as well as data streams may have 
to be stored or visualized. The storage should be on 
a NoSQL database since the tweets are in different 
formats, ranging from text to images to videos. Data 
stored in database may be used later for historical data 
analysis. However, the value of Twitter data usually 
belongs to the current situation and may be much 
different in another time and circumstance. 


Matches with high importance will result in higher TV 
ratings and number of spectators than matches with low im- 
portance . There is a positive correlation between the number 
of spectators and TV ratings. Matches played on weekdays 
will have fewer spectators and lower TV ratings than matches 
played on weekends. Videos will generate more activity than 
pictures, which in turn will generate more activity than status 
updates,links and events. Matches played during periods with 
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high Facebook activity [3] will have more spectators and 


IV. PHASES OF BIG DATA 


higher TV ratings.than matches played during periods with 4 Big Data Generation 


less Facebook activity. 


Il. METHODOLOGY 


Two data sets were used in this paper. They are 


i) Facebook data 
ii) Match data 


The first data set contained data from the Danish National 
Teams official Facebook page. The raw data consisted of a 
little more than 2.1M data points where each row is equiv- 
alent to an action on the Facebook page. The data contains 
information on action type (whether it is a post, comment 
or like), actor name and ID, timestamp, type of post, and 
if relevant, links and text value for posts and comments.The 
second data set about matches contained information such as 
date of the match, number of spectators. We can predict the 
spectators and tv ratings of football match from facebook data 
by observing the posts, comments ,likes etc.By observing the 
posts likes and comments we can see that as the no of days 
increases sharing of post increases.It reaches to as many people 
as possible.Hence viewers,likes,comments increases.If we post 
a match today the viwers are less but after some days or weeks 
we can see that viwers increases and likes comments are more 
than that of the start day. 

By observing the overall day-wise distribution of TVratings 
and total spectators in working days and non working days a 
graph is drawn. Monday through Thursday was coded as 0 for 
weekday, and Friday to Sunday as | for weekend. Hence viers 
in not working days is more compared to viewers in working 
days. 

Consumers dont take the time to watch videos on Facebook. 
The extratime and cost it takes for Landsholdet to produce 
videos is thus not worthwhile and it is suggested that the 
decrease the number of videos on their Facebook wall. 


III. PATHOLOGIES OF BIG DATA 


Through this report we are dealing with the topic Bigdata. 
Bigdata means storing large amount of data in a traditional 
way.It includes random and sequential access in disk and mem- 
ory.It deals about the size of data it can handle.Memory and 
cpu limitation is the major factor in software limits. Memory 
size disk size,processor and speed are the pratical limits. A 
database on the order of 100 GB would not be considered 
trivially small even today, although hard drives capable of 
storing 10 times as much can be had for less than $100 
at any computer store. The U.S. Census database included 
many different datasets of varying sizes, but lets simplify 
a bit: 100 gigabytes is enough to store at least the basic 
demographic informationage, sex, income, ethnicity, language, 
religion, housing status, and location, packed in a 128-bit 
recordfor every living human being on the planet. This would 
create a table of 6.75 billion rows and maybe 10 columns.Its 
just like a Bigdata. 


Data generation is the first step of big data. main sources 
of big data are the operation and trading information in 
enterprises, logistic and sensing information in the IoT, human 
interaction information and position information in the Internet 
world, and data generated in scientific research, etc. 


B. Big Data Acquisition 


As the second phase of the big data system, big data 
acquisition includes data collection, data transmission, and 
data pre-processing. 


e Data collection : 
Data collection is to utilize special data collection tech- 
niques to acquire raw data from a specific data generation 
environment. 

e Data transportation : 
Upon the completion of raw data collection, data will be 
transferred to a data storage infrastructure for processing 
and analysis. Big data is mainly stored in a data center. 

e Data pre-processing: 
Because of the wide variety of data sources, the col- 
lected datasets vary with respect to noise, redundancy, 
and consistency, etc. Some relational data pre-processing 
techniques are discussed as follows. 


C. Big Data Storage 


Various storage systems emerge to meet the demands of 
massive data. Existing massive storage technologies can be 
classified as Direct Attached Storage (DAS) and network 
storage, while network storage can be further classified into 
Network Attached Storage (NAS) and Storage Area Network 
(SAN). 

In DAS, various harddisks are directly connected with 
servers, and data management is server-centric. DAS is only 
suitable to interconnect servers with a small scale. NAS is 
actually an auxillary storage equipment of a network. NAS is 
network-oriented, SAN is especially designed for data storage 
with a scalable and bandwidth intensive network. 

Distributed storage system for efficiently data processing 
and analysis and used to store massive data. 


D. Big Data Analysis 


e Structured data analysis 
e Text data analysis 

e Web data analysis 

e Multimedia data analysis 
e Network data analysis 

e Mobile data analysis 


V. HADOOP 


In Big data analytics world the term Hadoop plays a very 
important role. Hadoop is an open source software and run on 
Linux platform.In today’s era the data is growing and varying 
at very fast rate. So to get valuable information from the data is 
itself become a challenge. The huge amount of data is termed 
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as BigData, which can be in structured, semi-structured or 
unstructured form. Structured data can be handled by relational 
database but for semi-structured and unstructured data we 
need nonrelational database [4] [5] . Satellite images, scientific 
data, reviews, comments are some examples of unstructured 
data. Hadoop is one of the solutions for BigData analysis and 
processing. Hadoop [7] has two components: one is HDFS 
and another is MapReduce. HDFS is a hadoop file system 
which stores the data by break down into fixed size blocks 
of 64MB. MapReduce is a processing paradigm which works 
in two steps: Map & Reduce. Map will split the given input 
into key/value pairs and Reduce will give the desired output 
by taking output of map step as input. 

The data set used for sentiment analysis contains reviews of 
different movies. The proposed method works in 2 steps. In 
first step the input is load into the HDFS(Hadoop Distributed 
File System), which will process the input and in second 
step MapReduce will convert the text reviews into key/value 
pairs and gives the desired output by taking output of map 
step as input [6]. The data set we used was taken from 
Amazon.com. Positive and negative word dictionary is used 
to identify positive and negative words. propose a mechanism 
to classify normal and abnormal posts. The dataset contains 
information related to HDFS stores the data into Data node and 
meta-data is stored in Name node. MapReduce paradigm is 
used for processing of data set. It will break the data set values 
into key-value pairs. Now the key-value pairs is processed by 
MapReduce and based on the positive and negative work list. 
The sentiments like positive or negative is evaluated and post 
is classified into normal if it is belongs to positive word and 
abnormal if it is belongs to negative word list. 

As relational database is unable to handle unstructured or 
semi-structured data, hadoop emerges as a biggest innovation 
that hit data analytics and management industries. It is a 
combination of two important but separate sub-projects: one 
is MapReduce, used as parallel-processing framework and 
second is HDFS [8], a file system used for reliable storage 
of data . 


VI. TOOLS USED 


Facebook page of Landsholdet were collected using the 
tool SODATO [1] [2] and TV ratings were collected from 


TNS Gallup. The two data sources were then combined using 
Tableau and SAS Studio tools. 


VII. CONCLUSION 


We understand that Big data consist of large amount of 
data ,and also varities of data.It is an abstract concept,apart 
from masses of data it also has some other features.Assuming 
increased activity leads to more spectators and TV ratings 
DBU can improve upon their socialmedia marketing strategy 
by making better return on their video posts. generally speak- 
ing, data is increasing with an exponential speed nowadays. 
However, corresponding information technology falls behind 
comparatively. Hence there is much remaining work for us 
to do about the data so that we could face the challenges 


brought by big data.Then we focus on the four phases of the 
value chain of big data, i.e., data generation, data acquisition, 
data storage, and data analysis. Analysis mechanism has been 
implemented on Hadoop platform using Map Reduce model. 
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